[cpp-threads] Yet another visibility question

Thu Nov 30 01:39:14 GMT 2006

> From: Peter Dimov
> 
> Boehm, Hans wrote:
> > Partially correcting myself:
> >
> > I think the analysis below applies if x is an atomic, even if it is 
> > accessed with unordered operations.  If x is an ordinary variable, 
> > there is a race between the assert and the assignment to x.  And I 
> > think that's what we want here.
> 
> x is an ordinary variable in the example.
> 
> It's not clear what we want to happen in this case. What I 
> want is for reference counting to be as efficient as possible 
> (within the confines of the model, of course.)
> 
> If the causality requirements of the model effectively imply 
> that in the real world the store to x can never race with the 
> assert, then I wouldn't want to be forced to put a redundant 
> ordered fetchadd(&y, 0) between the two just to satisfy the spec.
> 
> 
>  // x y initially zero
> 
>  // threads 1 to N:
> 
>  assert( x == 0 );
> 
>  if( fetchadd_release( &y, 1 ) == N - 1 ) {
>     x = 1;
>  }
> 

I think it would require a significant change to the current model to
allow this, but the more I think about it, the less I'm convinced it's a
bad idea.

Certainly, if the code sequence were

fetchadd_release( &y, 1 );
x = 1;

the compiler would be allowed to reorder those.  Thus the argument that
this works would have to be based on some sort of dependency-based
ordering, which we're currently relying on only for atomics (and then
only in a limited case, but that's the one that's relevant here).  And
that

- I think doesn't work if you reversed the load and store of x, with the
load in the conditional.  (You'd need to change the example to two
threads running different code for this to make sense.) If the load is
in the conditional, we currently allow moving the load up, or reusing a
value loaded before a release.  Thus the definition of a race between a
load and a store would have to be sensitive to which operation is the
load and which is the store.

- Would require some (very natural) hardware properties we are not
currently assuming.  It would require checking with some hardware
vendors to make sure that this is OK, but I would guess it is.

If we wanted to allow this, I think we would

- Have a fetch_add_release happen-before (effectively synchronize-with)
the next fetch_add_release on the same variable.  The cleanest way to do
that seems to go back to the model in which any atomic store
synchronizes-with an atomic load that reads that (or a later) value.
Acquire and release would be special only in that they were
inter-thread-ordered with respect to other operation in the thread.  (We
previously decided on the other approach here, but I think the arguments
were weak.  See the thread labelled "visibility question" at
http://www.decadentplace.org.uk/pipermail/cpp-threads/2006-August/thread
.html)

- Have the fetch_add_release be inter-thread-ordered-before the
dependent store to x.  Currently this is true for atomic x, so I think
this wouldn't be a disaster.  It would prohibit some kinds of compiler
analysis on the possible value of atomics, but we already do some of
that.  It would introduce an asymmetry into the data race definition, as
required.

On the plus side, I currently think it's not that hard to update the
wording to reflect this, and it seems to make it easier to express the
per-variable TSO constraint that we decided we need.

Opinions?

Hans