[cpp-threads] seq_cst compare_exchange and store-load fencing

Fri Jan 2 19:38:23 GMT 2009

< [] and BOLD annotations added to quoted text >

On Fri, Jan 2, 2009 at 6:55 PM, Paul E. McKenney
<paulmck at linux.vnet.ibm.com> wrote:
> On Fri, Jan 02, 2009 at 05:23:23PM +0100, Alexander Terekhov wrote:
>> compare_exchange performs both load and (conditional) store. This
>> leads to questions regarding store-load fencing for compare_exchange
>> in seq_cst mode:
>>
>> Q1) Does it provide store-load fencing in the case of
>>
>>    A.store(relaxed|release) ... B.compare_exchange(..., seq_cst)
>>
>> regarding A's store and B's load (in either success or failure case of
>> B's compare_exchange)?
>
> The proposed Power implementation provides this, but by accident.
> I do not believe that this is required.  Now, if you do:
>
>    A.store(seq_cst) ... B.compare_exchange(..., seq_cst)
>
> Alternatively, place an atomic_thread_fence(seq_cst) between the
> relaxed/release fence and the compare_exchange.

You probably meant:

"Alternatively, place an atomic_thread_fence(seq_cst) between the
relaxed/release STORE [not fence] and the [relaxed] compare_exchange."

>
> Then the proposed standard would guarantee the ordering.
>
>> Q2) Does it provide store-load fencing in the case of
>>
>>    B.compare_exchange(..., seq_cst) ... C.load(relaxed|acquire)
>>
>> regarding B's store and C's load (in success case of B's compare_exchange)?
>
> The proposed Power implementation provides a weak form of ordering
> in this case, but again, only by accident.

By "weak form of ordering in this case" you probably meant NON
sequentially consistent ordering in this case as in:

(From Book II):

"A successful stwcx. to a given location may complete
before its store has been performed with respect to
other processors and mechanisms."

Right?

> To be guaranteed this [sequentially consistent] ordering
>
>    B.compare_exchange(..., seq_cst) ... C.load(seq_cst)
>
> As before, another approach is to place an atomic_thread_fence(seq_cst)
> between the relaxed/release fence and the compare_exchange.

You probably meant:

"another approach is to place an atomic_thread_fence(seq_cst) between
[relaxed] compare_exchange and the relaxed/acquire load."

>
>> Under simple interpretation of "seq_cst" meaning "fully-fenced" the
>> answer to both questions is "yes"...
>
> But that is not the definition of "seq_cst" in the proposed standard,
> at least not as I read it.
>
>> Do you agree with the same outcome under the proposed C/C++ memory model?
>>
>> What is your reasoning in case you disagree?
>
> I appeal to the wording of section 29.1 of the proposed standard:
>
>        The enumeration memory_order specifies the detailed regular
>        (non-atomic) memory synchronization order as defined in Clause
>        1.10 and may provide for operation ordering.  Its enumerated
>        values and their meanings are as follows:
>
>            — memory_order_relaxed: no operation orders memory.
>            — memory_order_release, memory_order_acq_rel, and
>              memory_order_seq_cst: a store operation performs a release
>              operation on the affected memory location.
>            — memory_order_consume: a load operation performs a consume
>              operation on the affected memory location.
>            — memory_order_acquire, memory_order_acq_rel, and
>              memory_order_seq_cst: a load operation performs an acquire
>              operation on the affected memory location.
>
>        There shall be a single total order S on all memory_order_seq_cst
>        operations, consistent with the happens before order and
>        modification orders for all affected locations, such that each
>        memory_order_seq_cst operation that loads a value observes either
>        the last preceding modification according to this order S, or
>        the result of an operation that is not memory_order_seq_cst. [
>        Note: Although it is not explicitly required that S include locks,
>        it can always be extended to an order that does include lock and
>        unlock operations, since the ordering between those is already
>        included in the happens before ordering. — end note ]
>
> None of this requires that seq_cst operations be ordered with respect to
> non-seq_cst operations except as required by acquire, consume, and
> release semantics.

IOW, seq_cst means acq_rel (with further reduction to relaxed) except
that it guarantees store-load fencing with respect to preceding and/or
subsequent seq_cst, and only seq_cst... right?

Formalities regarding distinguishing

atomic_thread_fence(seq_cst), X.load(relaxed|acquire);

vs.

X.load(seq_cst);

why not state it in a more prominent place like 1.10 instead of 1K
pages below it? ;-)

>
> Now I personally have no objection to making seq_cst operations more
> expensive, but others might.  ;-)

I suspect that not making seq_cst operations more expensive (according
to simple "fully-fenced" reasoning) will result in quite a lot of
incorrect code.

We'll see.

regards,
alexander.