[cpp-threads] SC on PPC

Paul E. McKenney paulmck at linux.vnet.ibm.com
Fri May 11 04:56:36 BST 2007


On Wed, May 09, 2007 at 09:50:07PM -0000, Boehm, Hans wrote:
>  
> 
> > -----Original Message-----
> > From: cpp-threads-bounces at decadentplace.org.uk 
> > [mailto:cpp-threads-bounces at decadentplace.org.uk] On Behalf 
> > Of Raul Silvera
> > Sent: Wednesday, May 09, 2007 1:15 PM
> > To: C++ threads standardisation
> > Subject: RE: [cpp-threads] SC on PPC
> > 
> > 
> > Hans Boehm wrote on 05/04/2007 05:49:08 PM:
> > 
> > > > > On Wed, 2 May 2007, Alexander Terekhov wrote:
> > > > > > P1: x.store_relaxed(1)
> > > > > > P2: if (x.load_relaxed()==1) { y.store_release(1) }
> > > > > > P3: if (y.load_acquire()==1) { Assert(x.load_relaxed()==1) }
> > 
> > > I had unfortunately misread Alexander's example slightly, 
> > and read the 
> > > load_relaxed in P2 as a load_acquire.  I believe you really need to 
> > > change the store_relaxed/load_relaxed in P1/P2 to 
> > > store_release/load_acquire in order to guarantee that the assertion 
> > > will not fail in our proposal.  The P3 load of x can remain 
> > unchanged.
> > >
> > > In the release/acquire version
> > >
> > > x_initialization hb x.store_relaxed(P1) hb x.load_relaxed(P2) hb
> > > y.store_release(P2) hb y.load_acquire(P3) hb x.load_relaxed(P3).
> > 
> > I assume you meant "x_initialization hb x.store_release(P1) hb
> > x.load_relaxed(P2) hb ..."
> Oops.  I'll get it right eventually.  I meant
> 
> "x_initialization hb x.store_release(P1) hb x.load_acquire(P2) hb ..."
> > 
> > > It follows that P3s load of x cannot see the zero 
> > initialization of x, 
> > > since there is a store that "happens between" them.
> > >
> > > If either of the P1/P2 operations on x are "relaxed", there is no 
> > > ordering between (a)x.store_relaxed(P1) and 
> > x.load_relaxed(P2).  Hence
> > > (a) can become visible to either of the other threads at any time.  
> > > You don't get causality in the sense in which it's used 
> > here.  But you 
> > > do get a transitive happens-before relation; it's just a 
> > very sparse one.
> > >
> > > This is consistent with the Java approach, if you view 
> > "relaxed" C++ 
> > > atomics as equivalent to ordinary (non-atomic/volatile) 
> > Java variables.
> > 
> > I find this troublesome. As a programmer I don't see the 
> > rationale of why you would need a release store on P1. A 
> > release operation is needed to order a store with respect to 
> > preceding operations, but there are no preceding operations 
> > from P1 on this example.
>
> In the proposed model, a release/acquire pair is the only thing that
> enforces any visibility ordering between threads.  (And SC operations
> implicitly give you release/acquire semantics.)  "Relaxed" gives you
> nothing but atomicity, allowing it to be implemented as an ordinary
> load/store almost everywhere.  This is what I suspect you want for a DSM
> implementation, for example.  It may not be intuitive, but it does seem
> fairly simple.

There does not seem to be complete agreement among DSM experts on this
point, however.

> One could certainly think of alternate semantics.  (I'm not completely
> sure what the intended semantics are for java.util.concurrent.atomic's
> lazySet.  Doug may have something to add here.)  I do have my doubts that
> anything we come up with for "relaxed" will be particularly intuitive.
> I can't think of a real case in which this example matters, so I would
> be hesitant to complicate the description to accommodate it.  I suspect
> that one way or another, you will really need to understand the precise
> description in order to program portably at this level.

I agree that some modification to the semantics will be required in
order to accommodate bare fences.  However, there are a lot of them
out there, so it seems worth doing.

> > It seems natural to me that if an atomic relaxed load 
> > observes the value stored by an atomic relaxed store, then 
> > the relaxed store hb the relaxed load.

There are exceptions to this involving shared caches and shared
store buffers, right?  Or am I confused?

> This would have negative consequences on synchronization elimination.
> It would mean that if I have
> 
> T1: x.store_relaxed(1); z.fetch_add_acq_rel(1); y.store_relaxed(1)
> T2: if (y.load_relaxed()) w.store_release(1);
> 
> with z thread local (possibly after thread coalescing by the compiler),
> I still can't turn the fetch_add into an ordinary increment, since it
> orders the other operations.  This affects the performance of code that
> doesn't use the low level operations, since the fetch_add (or equivalently
> lock acquisition) may be in a separate compilation unit.

Hmmm...  Isn't this the general situation for optimizations?  There are
certainly analogs to this example for pointer-alias analysis and the
like.

> I think that for Java, we can't go there.  Lock elimination
> is important.  At least in the short term, the performance loss for
> ordinary lock-based Java code would outweigh anything we could get back
> from better atomics.

Perhaps what is needed is a way to declare a given compilation unit
or set of compilation units to have no dependencies on outside ordering.
That seems like it should handle a very large fraction of the cases
occuring in practice -- certainly many of the cases surrounding library
code.

> For C++, we have much less redundant synchronization to start with,
> but some of us feel this still matters a bit because of thread coalescing.
> This has been an issue some of us have gone back and forth on a few times.

Seems that a similar declaration would help quite a bit for C/C++ as well.

						Thanx, Paul

> Hans
> 
> > 
> > I don't think relaxed C++ atomics are equivalent ordinary 
> > Java variables; instead, I believe they are a middle ground 
> > between ordinary variables and SC atomics. The fundamental 
> > issue is that, unlike ordinary Java variables, relaxed 
> > operations can provide reliable communication between 
> > threads, and that should be part of the memory model.
> > 
> > --
> > Raúl E. Silvera         IBM Toronto Lab   Team Lead, Toronto Portable
> > Optimizer (TPO)
> > Tel: 905-413-4188 T/L: 969-4188           Fax: 905-413-4854
> > D2/KC9/8200/MKM
> > 
> > 
> > 
> > --
> > cpp-threads mailing list
> > cpp-threads at decadentplace.org.uk
> > http://www.decadentplace.org.uk/cgi-bin/mailman/listinfo/cpp-threads
> > 
> 
> --
> cpp-threads mailing list
> cpp-threads at decadentplace.org.uk
> http://www.decadentplace.org.uk/cgi-bin/mailman/listinfo/cpp-threads



More information about the cpp-threads mailing list