[cpp-threads] [Javamemorymodel-discussion] there's a happens-before orderhere. right?

Sat Dec 13 14:42:43 GMT 2008

(Cc: "C++ threads standardisation" <cpp-threads at decadentplace.org.uk>)

On Sat, Dec 13, 2008 at 2:04 AM, Boehm, Hans <hans.boehm at hp.com> wrote:
[...]
>> http://www.hpl.hp.com/personal/Hans_Boehm/c++mm/threadsintro.html#examples
>>
>> would you label the following C++0x program
>>
>>     int data; //= 0
>>     atomic<int> x; //= 0
>>
>>     thread 1:
>>     ------------
>>     data = 1;
>>     x.store(1, release);
>>
>>     thread 2:
>>     ------------
>>     if (x.load(relaxed))
>>       data = 2;
>>
>> data-race-free or not? Why? TIA.
>>
> This is well in the "escapes from sequential consistency" category,
> and it doesn't currently have a Java analog.

Yes. I'm actually driving at

http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2008/n2745.html

> As I said, I was really trying to steer most people well away from
> this kind of code.

The problem is that std::atomic<> in SC mode is utterly expensive on
POWER. Even in acquire-release mode it will inject way too much
totally redundant (lw/i)sync(hronization).

> I'll admit it sometimes make sense in really performance critical
> code that has to run on machines on which SC operations are
> expensive to implement.

If performance isn't critical then there isn't much need for
std::atomic<>... I'd simply use locks.

>
> This has a data race.

This is quite surprising to me given your first two examples. I
just can't conceive a platform that would produce a race on "data"
in the example above and NOT break your first (and second) example
as well by not respecting control dependency which imposes ordering
for load-store sequence in thread 1 (and thread 2 in your second
example).

Let's take your first example and slightly change it by renaming y
to "data" and using std::atomic<> for x:

    int data; //= 0
    atomic<int> x; //= 0

    thread 1:
    ------------
    if (x.load(relaxed) > 0)
      data = 1;

    thread 2:
    ------------
    data = 2;

This is still data-race-free, right? I hope so.

Now let's introduce a store to std::atomic<> x in thread 2.

    int data; //= 0
    atomic<int> x; //= 0

    thread 1:
    ------------
    if (x.load(relaxed) > 0)
      data = 1;

    thread 2:
    ------------
    data = 2;
    x.store(-1, release);

I just can't conceive how that could possibly introduce a race on
"data"...

The next step is to change x.store(-1, release) to x.store(+1,
release) :

    int data; //= 0
    atomic<int> x; //= 0

    thread 1:
    ------------
    if (x.load(relaxed) > 0)
      data = 1;

    thread 2:
    ------------
    data = 2;
    x.store(+1, release);

This just ought to be data-race-free as well!!!

> To see that you really have to go back to reasoning based on
> "happens before".

Well, then "happens before" needs some fixing, I believe.

>
> It's hard to conceive of a case in which this particular code
> could actually cause problems in practice.  And one could
> conceive of extensions to memory_order_consume in the standard
> to handle this.  But trying to generally guarantee ordering for
> the two memory operations in the second thread gets tricky
> fairly quickly.  You probably don't want the two memory
> operations in
>
> if (r1 = x.load(memory_order_relaxed)) {
>    data.store(2, memory_order_relaxed);
> } else {
>    data.store(2, memory_order_relaxed);
> }
>
> to be ordered, for example.

Why not? The above is simply "high level" expression of
"Safe Fetch" from Book II:

http://www.power.org/resources/reading/PowerISA_V2.05.pdf
(see B.2.3 Safe Fetch)

-----
If a load must be performed before a subsequent store

[...]

In this example it is assumed that the address of the storage
operand to be loaded is in GPR 3, the contents of the storage
operand are returned in GPR 4, and the address of the storage
operand to be stored is in GPR5.

-----
If a load must be performed before a subsequent store [...]

In this example it is assumed that the address of the storage
operand to be loaded is in GPR 3, the contents of the storage
operand are returned in GPR 4, and the address of the storage
operand to be stored is in GPR5.

  lwz  r4,0(r3) #load shared data
  cmpw r4,r4    #set CR0 to "equal"
  bne- $-8      #branch never taken
  stw  r7,0(r5) #store other shared data
-----

If I would not care about implied ordering due to control
dependency I'd simply write:

  r1 = x.load(memory_order_relaxed);
  data.store(2, memory_order_relaxed);

Note that

  r1 = x.load(memory_order_acquire);
  data.store(2, memory_order_relaxed);

is more expensive than "Safe Fetch"...

regards,
alexander.