[cpp-threads] seq_cst compare_exchange and store-load fencing
Alexander Terekhov
alexander.terekhov at gmail.com
Fri Jan 2 19:38:23 GMT 2009
< [] and BOLD annotations added to quoted text >
On Fri, Jan 2, 2009 at 6:55 PM, Paul E. McKenney
<paulmck at linux.vnet.ibm.com> wrote:
> On Fri, Jan 02, 2009 at 05:23:23PM +0100, Alexander Terekhov wrote:
>> compare_exchange performs both load and (conditional) store. This
>> leads to questions regarding store-load fencing for compare_exchange
>> in seq_cst mode:
>>
>> Q1) Does it provide store-load fencing in the case of
>>
>> A.store(relaxed|release) ... B.compare_exchange(..., seq_cst)
>>
>> regarding A's store and B's load (in either success or failure case of
>> B's compare_exchange)?
>
> The proposed Power implementation provides this, but by accident.
> I do not believe that this is required. Now, if you do:
>
> A.store(seq_cst) ... B.compare_exchange(..., seq_cst)
>
> Alternatively, place an atomic_thread_fence(seq_cst) between the
> relaxed/release fence and the compare_exchange.
You probably meant:
"Alternatively, place an atomic_thread_fence(seq_cst) between the
relaxed/release STORE [not fence] and the [relaxed] compare_exchange."
>
> Then the proposed standard would guarantee the ordering.
>
>> Q2) Does it provide store-load fencing in the case of
>>
>> B.compare_exchange(..., seq_cst) ... C.load(relaxed|acquire)
>>
>> regarding B's store and C's load (in success case of B's compare_exchange)?
>
> The proposed Power implementation provides a weak form of ordering
> in this case, but again, only by accident.
By "weak form of ordering in this case" you probably meant NON
sequentially consistent ordering in this case as in:
(From Book II):
"A successful stwcx. to a given location may complete
before its store has been performed with respect to
other processors and mechanisms."
Right?
> To be guaranteed this [sequentially consistent] ordering
>
> B.compare_exchange(..., seq_cst) ... C.load(seq_cst)
>
> As before, another approach is to place an atomic_thread_fence(seq_cst)
> between the relaxed/release fence and the compare_exchange.
You probably meant:
"another approach is to place an atomic_thread_fence(seq_cst) between
[relaxed] compare_exchange and the relaxed/acquire load."
>
>> Under simple interpretation of "seq_cst" meaning "fully-fenced" the
>> answer to both questions is "yes"...
>
> But that is not the definition of "seq_cst" in the proposed standard,
> at least not as I read it.
>
>> Do you agree with the same outcome under the proposed C/C++ memory model?
>>
>> What is your reasoning in case you disagree?
>
> I appeal to the wording of section 29.1 of the proposed standard:
>
> The enumeration memory_order specifies the detailed regular
> (non-atomic) memory synchronization order as defined in Clause
> 1.10 and may provide for operation ordering. Its enumerated
> values and their meanings are as follows:
>
> — memory_order_relaxed: no operation orders memory.
> — memory_order_release, memory_order_acq_rel, and
> memory_order_seq_cst: a store operation performs a release
> operation on the affected memory location.
> — memory_order_consume: a load operation performs a consume
> operation on the affected memory location.
> — memory_order_acquire, memory_order_acq_rel, and
> memory_order_seq_cst: a load operation performs an acquire
> operation on the affected memory location.
>
> There shall be a single total order S on all memory_order_seq_cst
> operations, consistent with the happens before order and
> modification orders for all affected locations, such that each
> memory_order_seq_cst operation that loads a value observes either
> the last preceding modification according to this order S, or
> the result of an operation that is not memory_order_seq_cst. [
> Note: Although it is not explicitly required that S include locks,
> it can always be extended to an order that does include lock and
> unlock operations, since the ordering between those is already
> included in the happens before ordering. — end note ]
>
> None of this requires that seq_cst operations be ordered with respect to
> non-seq_cst operations except as required by acquire, consume, and
> release semantics.
IOW, seq_cst means acq_rel (with further reduction to relaxed) except
that it guarantees store-load fencing with respect to preceding and/or
subsequent seq_cst, and only seq_cst... right?
Formalities regarding distinguishing
atomic_thread_fence(seq_cst), X.load(relaxed|acquire);
vs.
X.load(seq_cst);
why not state it in a more prominent place like 1.10 instead of 1K
pages below it? ;-)
>
> Now I personally have no objection to making seq_cst operations more
> expensive, but others might. ;-)
I suspect that not making seq_cst operations more expensive (according
to simple "fully-fenced" reasoning) will result in quite a lot of
incorrect code.
We'll see.
regards,
alexander.
More information about the cpp-threads
mailing list