[cpp-threads] SC on PPC

Wed May 9 22:50:07 BST 2007

> -----Original Message-----
> From: cpp-threads-bounces at decadentplace.org.uk 
> [mailto:cpp-threads-bounces at decadentplace.org.uk] On Behalf 
> Of Raul Silvera
> Sent: Wednesday, May 09, 2007 1:15 PM
> To: C++ threads standardisation
> Subject: RE: [cpp-threads] SC on PPC
> 
> 
> Hans Boehm wrote on 05/04/2007 05:49:08 PM:
> 
> > > > On Wed, 2 May 2007, Alexander Terekhov wrote:
> > > > > P1: x.store_relaxed(1)
> > > > > P2: if (x.load_relaxed()==1) { y.store_release(1) }
> > > > > P3: if (y.load_acquire()==1) { Assert(x.load_relaxed()==1) }
> 
> > I had unfortunately misread Alexander's example slightly, 
> and read the 
> > load_relaxed in P2 as a load_acquire.  I believe you really need to 
> > change the store_relaxed/load_relaxed in P1/P2 to 
> > store_release/load_acquire in order to guarantee that the assertion 
> > will not fail in our proposal.  The P3 load of x can remain 
> unchanged.
> >
> > In the release/acquire version
> >
> > x_initialization hb x.store_relaxed(P1) hb x.load_relaxed(P2) hb
> > y.store_release(P2) hb y.load_acquire(P3) hb x.load_relaxed(P3).
> 
> I assume you meant "x_initialization hb x.store_release(P1) hb
> x.load_relaxed(P2) hb ..."
Oops.  I'll get it right eventually.  I meant

"x_initialization hb x.store_release(P1) hb x.load_acquire(P2) hb ..."
> 
> > It follows that P3s load of x cannot see the zero 
> initialization of x, 
> > since there is a store that "happens between" them.
> >
> > If either of the P1/P2 operations on x are "relaxed", there is no 
> > ordering between (a)x.store_relaxed(P1) and 
> x.load_relaxed(P2).  Hence
> > (a) can become visible to either of the other threads at any time.  
> > You don't get causality in the sense in which it's used 
> here.  But you 
> > do get a transitive happens-before relation; it's just a 
> very sparse one.
> >
> > This is consistent with the Java approach, if you view 
> "relaxed" C++ 
> > atomics as equivalent to ordinary (non-atomic/volatile) 
> Java variables.
> 
> I find this troublesome. As a programmer I don't see the 
> rationale of why you would need a release store on P1. A 
> release operation is needed to order a store with respect to 
> preceding operations, but there are no preceding operations 
> from P1 on this example.
In the proposed model, a release/acquire pair is the only thing that enforces any visibility ordering between threads.  (And SC operations implicitly give you release/acquire semantics.)  "Relaxed" gives you nothing but atomicity, allowing it to be implemented as an ordinary load/store almost everywhere.  This is what I suspect you want for a DSM implementation, for example.  It may not be intuitive, but it does seem fairly simple.

One could certainly think of alternate semantics.  (I'm not completely sure what the intended semantics are for java.util.concurrent.atomic's lazySet.  Doug may have something to add here.)  I do have my doubts that anything we come up with for "relaxed" will be particularly intuitive.  I can't think of a real case in which this example matters, so I would be hesitant to complicate the description to accommodate it.  I suspect that one way or another, you will really need to understand the precise description in order to program portably at this level.

> 
> It seems natural to me that if an atomic relaxed load 
> observes the value stored by an atomic relaxed store, then 
> the relaxed store hb the relaxed load.
This would have negative consequences on synchronization elimination.  It would mean that if I have

T1: x.store_relaxed(1); z.fetch_add_acq_rel(1); y.store_relaxed(1)
T2: if (y.load_relaxed()) w.store_release(1);

with z thread local (possibly after thread coalescing by the compiler), I still can't turn the fetch_add into an ordinary increment, since it orders the other operations.  This affects the performance of code that doesn't use the low level operations, since the fetch_add (or equivalently lock acquisition) may be in a separate compilation unit.

I think that for Java, we can't go there.  Lock elimination is important.  At least in the short term, the performance loss for ordinary lock-based Java code would outweigh anything we could get back from better atomics.

For C++, we have much less redundant synchronization to start with, but some of us feel this still matters a bit because of thread coalescing.  This has been an issue some of us have gone back and forth on a few times.

Hans

> 
> I don't think relaxed C++ atomics are equivalent ordinary 
> Java variables; instead, I believe they are a middle ground 
> between ordinary variables and SC atomics. The fundamental 
> issue is that, unlike ordinary Java variables, relaxed 
> operations can provide reliable communication between 
> threads, and that should be part of the memory model.
> 
> --
> Raúl E. Silvera         IBM Toronto Lab   Team Lead, Toronto Portable
> Optimizer (TPO)
> Tel: 905-413-4188 T/L: 969-4188           Fax: 905-413-4854
> D2/KC9/8200/MKM
> 
> 
> 
> --
> cpp-threads mailing list
> cpp-threads at decadentplace.org.uk
> http://www.decadentplace.org.uk/cgi-bin/mailman/listinfo/cpp-threads
>