[cpp-threads] Yet another visibility question

Tue Dec 19 01:13:23 GMT 2006

> From:  Peter Dimov
> > 1.10p5:
> > {Generalize the second condition for inter-thread-ordered-before to 
> > ordinary writes, but not reads:} "A is an unordered atomic 
> read and B 
> > is an unordered atomic write, and either the value written 
> by B varies 
> > depending on the value read by A, or the execution of B is 
> conditioned 
> > on the value read by A."
> > ==>
> > "A is an unordered atomic read and B writes a value which 
> depends on 
> > the value read by A, or the execution of B is conditioned 
> on the value 
> > read by A."
> 
> You are going back to the "depends on" formulation, which has 
> a problem with things like r1-r1 depending on r1 before 
> optimizations, but not after.
> 
The usual answer here is that, as Peter suggested, we use a dynamic
notion of dependency in which r1-r1 doesn't depend on r1.  We just have
to make sure we state that correctly.

However, partially inspired by Peter's message, I decided to try to
write down what we know about relying on dependencies for ordering,
before we forget again.  The result is at

http://www.hpl.hp.com/personal/Hans_Boehm/c++mm/dependencies.html

(This could no doubt be improved, since I wasn't sure where this was
going when I started.)

Unfortunately, this points out that the generalization to ordinary
writes I suggested above won't work.

A very simple (perhaps too simple) counter-example is

r1 = x.load_raw();
if (r1) {
  y = 1;  // y not atomic
} else {
  y = 1;
}

I think we really have to declare the stores here to be dependent on the
load, since each one is.  And in real code, the two may be hidden by
abstraction boundaries, making it impossible for the programmer to tell
that the dependency could be eliminated.

On the other hand, his clearly has to be equivalent to

r1 = x.load_raw();
y = 1;

which has no dependence, and hence has to be unordered, both because
weakly consistent hardware will reorder it, and because the compiler is
normally allowed to.  Note also that we can easily perturb the example
to allow the offending optimization to be applied in a different
compilation unit from the load_raw.  This this can't around by
restricting optimization on atomics.

This example is already slightly annoying if we limit dependence-based
ordering to unordered atomics as before, since it means that the
compiler has to treat all calls to store_raw as more-or-less-opaque
calls to DISTINCT functions.  But I still think that's far less of a
problem than the alternative, since it only affects code dealing with
atomics.  And I think it's really only a minor complication for the
implementation.  (If the implementation macro-expands store_raw to an
inline asm, I expect the right thing will happen automagically.)

Hans