[cpp-threads] SC on PPC

Fri May 4 22:49:08 BST 2007

> -----Original Message-----
> From:  Raul Silvera
> Hans Boehm wrote on 05/02/2007 11:06:12 AM:
> > On Wed, 2 May 2007, Alexander Terekhov wrote:
> >
> > >
> > > I'm talking about the cost of acquire on PPC.
> > >
> > > P1: x.store_relaxed(1)
> > > P2: if (x.load_relaxed()==1) { y.store_release(1) }
> > > P3: if (y.load_acquire()==1) { Assert(x.load_relaxed()==1) }
> > >
> > > wont abort with
> > >
> > > load_acquire:   load;"branch never taken";isync // see B.2.3 Safe
> > > Fetch (Book II).
> > > store_release:  lwsync;store
> > >
> > > I don't want to have more constrained
> > >
> > > load_acquire:   load;lwsync
> > >
> > Certainly that's an interesting question.  On the other hand, I'm 
> > saying that in the proposed C++ memory model, this is 
> allowed to abort 
> > unless
> you
> > change the initial store_relaxed to a store_release.
> >
> > I susoect this has little bearing on the rest of your 
> discussion, though.
> > And I can't think of realistic situations in which you 
> wouldn't need 
> > the store_release in P1 anyway.
> 
> Thanks for the clarification, Hans. I have all along been 
> thinking of symmetric acquire and releases, as defined on 
> N2153. So your proposal is to have causality be triggered by 
> release but not acquire? Or do you need both acquire and 
> releases to trigger causality? Do you foresee this changing 
> on the next version of the model?
> 
> I realize that this is not so interesting if all you have is 
> acquire/release operations, but once you include relaxed 
> operations (and fences) it is important to clearly define how 
> they interact with each other. In particular, in Alexander's 
> example above, why is the store_relaxed() from P1 
> insufficient even though
> P1 doesn't
> issue any other memory operations? What if there was a 
> release_fence() before that store_relaxed()?
> 
I had unfortunately misread Alexander's example slightly, and read the
load_relaxed in P2 as a load_acquire.  I believe you really need to
change the store_relaxed/load_relaxed in P1/P2 to
store_release/load_acquire in order to guarantee that the assertion will
not fail in our proposal.  The P3 load of x can remain unchanged.

In the release/acquire version

x_initialization hb x.store_relaxed(P1) hb x.load_relaxed(P2) hb
y.store_release(P2) hb y.load_acquire(P3) hb x.load_relaxed(P3).

It follows that P3s load of x cannot see the zero initialization of x,
since there is a store that "happens between" them.

If either of the P1/P2 operations on x are "relaxed", there is no
ordering between (a)x.store_relaxed(P1) and x.load_relaxed(P2).  Hence
(a) can become visible to either of the other threads at any time.  You
don't get causality in the sense in which it's used here.  But you do
get a transitive happens-before relation; it's just a very sparse one.

This is consistent with the Java approach, if you view "relaxed" C++
atomics as equivalent to ordinary (non-atomic/volatile) Java variables.

Hans