[cpp-threads] SC on PPC

Raul Silvera rauls at ca.ibm.com
Sat May 12 02:26:15 BST 2007



Hans wrote on 05/09/2007 05:50:07 PM:
> > > > > On Wed, 2 May 2007, Alexander Terekhov wrote:
> > > > > > P1: x.store_relaxed(1)
> > > > > > P2: if (x.load_relaxed()==1) { y.store_release(1) }
> > > > > > P3: if (y.load_acquire()==1) { Assert(x.load_relaxed()==1) }
> >
> > > I had unfortunately misread Alexander's example slightly,
> > and read the
> > > load_relaxed in P2 as a load_acquire.  I believe you really need to
> > > change the store_relaxed/load_relaxed in P1/P2 to
> > > store_release/load_acquire in order to guarantee that the assertion
> > > will not fail in our proposal.  The P3 load of x can remain
> > unchanged.
> > >
> > > In the release/acquire version
> > >
> > > x_initialization hb x.store_relaxed(P1) hb x.load_relaxed(P2) hb
> > > y.store_release(P2) hb y.load_acquire(P3) hb x.load_relaxed(P3).
> >
> > I assume you meant "x_initialization hb x.store_release(P1) hb
> > x.load_relaxed(P2) hb ..."
> Oops.  I'll get it right eventually.  I meant
>
> "x_initialization hb x.store_release(P1) hb x.load_acquire(P2) hb ..."

Right. That's what I actually tried to write (you'd think a two-line
example should be hard to get wrong :-)

While I realize that hb requires more than just ordinary memory accesses,
and thus it is not suitable for relaxed pairs, some form of inter-thread
ordering triggered by relaxed pairs is necessary, specially as you
introduce explicit fences. Acquire and release operations should be
replaceable by relaxed operations + fences, and in that scenario the
inter-thread communication will occur through pairs of relaxed operations.

Maybe what would be appropiate for this example is x.store_relaxed(P1) hb
y.store_release(P2). Perhaps this hb relationship should be triggered by
the store_release.

> This would have negative consequences on synchronization
> elimination.  It would mean that if I have
>
> T1: x.store_relaxed(1); z.fetch_add_acq_rel(1); y.store_relaxed(1)
> T2: if (y.load_relaxed()) w.store_release(1);
>
> with z thread local (possibly after thread coalescing by the
> compiler), I still can't turn the fetch_add into an ordinary
> increment, since it orders the other operations.  This affects the
> performance of code that doesn't use the low level operations, since
> the fetch_add (or equivalently lock acquisition) may be in a
> separate compilation unit.

I agree that synchronization elimination is important. However, if relaxed
operations cannot be used for inter-thread communication they don't seem to
have much value. What is the intended usage for such un-orderable relaxed
operations?

--
Raúl E. Silvera         IBM Toronto Lab   Team Lead, Toronto Portable
Optimizer (TPO)
Tel: 905-413-4188 T/L: 969-4188           Fax: 905-413-4854
D2/KC9/8200/MKM





More information about the cpp-threads mailing list