[cpp-threads] std::atomic<> in acquire-release mode and write atomicity

Alexander Terekhov alexander.terekhov at gmail.com
Tue Dec 16 17:12:06 GMT 2008


On Tue, Dec 16, 2008 at 5:56 PM, Paul E. McKenney
<paulmck at linux.vnet.ibm.com> wrote:
[...]
>> P1: Y.store(1, release);
>> P2: if( Y.load(acquire) == 1 ) { Z.store(1, release); }
>> P3: if( Z.load(acquire) == 1 ) { assert( Y.load(acquire) == 1 ); }
>
> Well, that does put a different light on it.  ;-)
>
> OK, P1's release has no effect, as there is no prior operation.
>
> P2's store-release to Z ensures that P2's load from Y is performed
> WRT all other threads before P2's store to Z.  A-cumulativity
> ensures that if P2's load from Y sees P1's store, then P1's store
> to Y is performed WRT all threads before P2's store to Z.  These
> are both stores, and hence are "applicable" to a store-release,
> which on PowerPC turns into an lwsync instruction.
>
> If P3's load from Z sees P2's store, then by B-cumulativity, P3's
> load from Y is P2's lwsync's B-set -except- that a prior store and
> following load is not "applicable" in the case of lwsync.  So, let's
> turn to P3's acquire operation, which becomes a conditional-branch/isync
> combination.  This means that P3's load from Z is performed before
> P3's load from Y WRT all threads.  Because P3's load from Z returned
> 1, it must have been performed after P2's store to Z WRT P3.
>
> But P1's store to Y was performed WRT all threads before P2's
> store to Z, as noted earlier.  Therefore, the assert is required to
> see the new value of Y, and thus cannot fail.  I think, anyway.
>
> Hey, you asked!!!  ;-)

Are you in disagreement with your own paper? ;-)

http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2008/n2745.html

"... permits the following counter-intuitive sequence of events, with
all variables initially zero, and results of loads in square brackets
following the load:

1. CPU 0: x=1
2. CPU 1: r1=x [1]
3. CPU 1: lwsync
4. CPU 1: y=1
5. CPU 2: r2=y [1]
6. CPU 2: bc;isync
7. CPU 2: r3=x [0]

This sequence of events is more likely to occur on systems where CPUs
0 and 1 are closely related, for example, when CPUs 0 and 1 are
hardware threads in one core and CPU 2 is a hardware thread in another
core.

[...]

Cumulativity does not come into play here because prior stores (CPU
0's store to x) and subsequent loads (CPU 2's load from x) are not
applicable to the lwsync instruction. However, if the lwsync were to
be replaced with a hwsync, the outcome shows above would be
impossible. "

regards,
alexander.



More information about the cpp-threads mailing list