[cpp-threads] Brief example ITANIUMImplementation forC/C++MemoryModel

Paul E. McKenney paulmck at linux.vnet.ibm.com
Sat Jan 3 23:53:45 GMT 2009


On Sat, Jan 03, 2009 at 03:45:07AM +0200, Peter Dimov wrote:
> Paul E. McKenney:
>
>>> http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2008/n2806.html#926
>>
>> I missed this one.  Strange, though...  The proposed resolution is to
>> add a new paragraph in 29.1 as follows:
>>
>> For atomic operations A and B on an atomic object M, where A and
>> B modify M, if there are memory_order_seq_cst fences X and Y
>> such that A is sequenced before X, Y is sequenced before B, and
>> X precedes Y in S, then B occurs later than A in the modifiction
>> order of M.
>>
>> Assuming s/modifiction/modification/, doesn't the modification-order
>> wording already guarantee this ordering for atomic operations?
>
> I don't think that it does. The question is, can we have the following 
> cycle:
>
> A hb X <s Y hb B <m A
>
> ?
>
> Nothing in the specification prohibits it, as far as I can see; <s is only 
> required to be "consistent" with hb and <m, which means that we can't have 
> P hb Q and Q <s P or P <m Q and Q <s P.

Ah!  Thank you for the explanation.

>> In Alexander's example above, the issue is not the order of the loads,
>> but rather the order of P1's and P2's stores to different atomic
>> variables.  Power uses cumulativity to allow the fences between the
>> loads to force a globally consistent order, but other architectures act
>> differently -- for example, x86 would require that the two stores use
>> atomic RMW operations.
>
>>> > P1: Y.store(1, relaxed);
>>> > P2: Z.store(1, relaxed); fence(seq_cst); a = Y.load(relaxed);
>>> > P3: b = Y.load(relaxed); fence(seq_cst); c = Z.load(relaxed);
>
> I do think that it is in fact a load ordering issue. The store to Z is only 
> used to fix the order of the two fences. We can't have P2.fence <s P3.fence 
> because this would enforce c = 1. P3's fence must therefore precede P2's 
> fence in S. So we're left with:
>
> P3: b = Y.load(relaxed); fence;
> P2:                                         fence; c = Y.load(relaxed);
>
> and the question is, can we have P2's load seeing an earlier value in the 
> modification order of Y? This is similar to the example in #926 in that it 
> also concerns the consistency between hb, <s and <m.
>
> (I suspect that this is, in fact, enforced on x86. As far as the fences go, 
> POWER tends to be the least forgiving architecture.) 

For whatever it is worth, POWER would disallow this outcome due to
cumulativity.  If c==0, then P3's load from Y will have been performed
before P2's load from Y with respect to all processors, so that if a==0,
then we must also have b==0.

							Thanx, Paul



More information about the cpp-threads mailing list