[cpp-threads] [Javamemorymodel-discussion] there's a happens-before orderhere. right?

Mon Dec 15 17:23:23 GMT 2008

On Mon, Dec 15, 2008 at 3:47 PM, Anthony Williams
<anthony at justsoftwaresolutions.co.uk> wrote:
> At Mon 15 Dec 2008 14:38:30 UTC, Alexander Terekhov
> <alexander.terekhov at gmail.com> wrote:
>
>> On Mon, Dec 15, 2008 at 2:58 PM, Anthony Williams
>> <anthony at justsoftwaresolutions.co.uk> wrote:
>>>
>>> At Mon 15 Dec 2008 13:32:56 UTC, Alexander Terekhov
>>> <alexander.terekhov at gmail.com> wrote:
>>>
>>>> On Mon, Dec 15, 2008 at 12:47 PM, Anthony Williams
>>>> <anthony at justsoftwaresolutions.co.uk> wrote:
>>>>>
>>>>> At Sat 13 Dec 2008 14:42:43 UTC, Alexander Terekhov
>>>>> <alexander.terekhov at gmail.com> wrote:
>>>>>>
>>>>>> Now let's introduce a store to std::atomic<> x in thread 2.
>>>>>>
>>>>>>  int data; //= 0
>>>>>>  atomic<int> x; //= 0
>>>>>>
>>>>>>  thread 1:
>>>>>>  ------------
>>>>>>  if (x.load(relaxed) > 0)
>>>>>>    data = 1;
>>>>>>
>>>>>>  thread 2:
>>>>>>  ------------
>>>>>>  data = 2;
>>>>>>  x.store(-1, release);
>>>>>>
>>>>>> I just can't conceive how that could possibly introduce a race on
>>>>>> "data"...
>>>>>
>>>>> Agreed. The load may read 0 (initial value) or -1 (stored value), but
>>>>> in
>>>>> neither case is the branch followed, so there is no race.
>>>>
>>>> IOW you agree that it shall not be allowed to "speculate" conditional
>>>> store (e.g. by (mis)predicting positive value for load) such that
>>>> conditional store could happen before thread 1 has actually observed
>>>> positive value in x. But that means that conditional store "happens
>>>> after" load, which is to say load "happens before" conditional
>>>> store!!!
>>>
>>> Yes, the load happens-before the store. That's required by the C++0x
>>> memory
>>> model, since the load is sequenced-before the store in the same thread.
>>>
>>>>>
>>>>>> The next step is to change x.store(-1, release) to x.store(+1,
>>>>>> release) :
>>>>>>
>>>>>>  int data; //= 0
>>>>>>  atomic<int> x; //= 0
>>>>>>
>>>>>>  thread 1:
>>>>>>  ------------
>>>>>>  if (x.load(relaxed) > 0)
>>>>>>    data = 1;
>>>>>>
>>>>>>  thread 2:
>>>>>>  ------------
>>>>>>  data = 2;
>>>>>>  x.store(+1, release);
>>>>>>
>>>>>> This just ought to be data-race-free as well!!!
>>>>>
>>>>> I don't see why. The load is relaxed, so it is unordered. If it reads
>>>>> "+1",
>>>>> there is still no happens-before relationship between the two stores to
>>>>> data, so we have a race.
>>>>
>>>> See above.
>>>
>>> These cases are different. In the first case, the load could read "0" or
>>> "-1", neither of which would cause "data" to be written to. In the
>>> second,
>>> the load could read "0" or "+1", and a read of "+1" *would* cause "data"
>>> to
>>> be written to.
>>>
>>> The load is "relaxed", so even though the load from x happens-before the
>>> store to data in thread 1, and the store to data in thread 2
>>> happens-before
>>> the store to x, we *don't* have the necessary transitive relationships to
>>> order the two stores to data. In particular, thread 1 might see the
>>> effects
>>> of the store to x (i.e. x.load returns "+1") *before* it sees the effects
>>> of
>>> the store to data from thread 2.
>>
>> You seem to confuse my examples with something like
>>
>>   int data; //= 0
>>   atomic<int> x; //= 0
>>
>>   thread 1:
>>   ------------
>>
>>   if (x.load(relaxed) > 0)
>>     std::cout << data;
>>
>>   thread 2:
>>   ------------
>>   data = 2;
>>   x.store(+1, release);
>>
>> or
>>
>>   int data; //= 0
>>   atomic<int> x; //= 0
>>
>>   thread 1:
>>   ------------
>>
>>   if (x.load(relaxed) > 0)
>>     ++data;
>>
>>   thread 2:
>>   ------------
>>   data = 2;
>>   x.store(+1, release);
>>
>> which needs to "see" the effect of data = 2 store. I agree that such
>> programs are is NOT data-race-free. But that has nothing to do with my
>> examples...
>
> It doesn't matter what operation thread 1 is doing on data.

And that is actually the key reason of inefficiency in the model.

> Thread 2 is doing a write, so if thread 1 does *any* operation on data
> without ordering, you have a data race.

But there's ordering here... think of it as having an implied (due to
control dependency) "atomic_thread_fence(memory_order_acquire)" (see
29.6 Fences in the current draft) affecting stores but not loads.

   int data; //= 0
   atomic<int> x; //= 0

   thread 1:
   ------------
   data = 1;
   x.store(1, release);

   thread 2:
   ------------
   if (x.load(relaxed)) {
     atomic_thread_fence(memory_order_acquire); // redundant
     data = 2;
   }

regards,
alexander.