[cpp-threads] Alternatives to SC

Raul Silvera rauls at ca.ibm.com
Tue Jan 16 15:00:07 GMT 2007


cpp-threads-bounces at decadentplace.org.uk wrote on 01/16/2007 08:54:07 AM:

> On 1/16/07, Alexander Terekhov <alexander.terekhov at gmail.com> wrote:
> > On 1/16/07, Raul Silvera <rauls at ca.ibm.com> wrote:
> > [...]
> > > > A sync()-less version with lwarx/stwcx loop on P3 relies on
implicit
> > > > ordering of stores by control dependencies (code conditional hoist
> > > > store barrier implied by hardware) and ability of stwcx to detect
> > > > "stale" values loaded by lwarx.
> > >
> > > Oh, I understand what you're proposing now. You're relying on the
control
> > > flow dependence ordering the load of y and an invented stwcx x that
you
> > > would introduce with the same value you loaded from x.
> > >
> > > As clever as that is, it is overkill (plus not general: for all
> P3 knows, x
> > > could be on read-only memory). PPC will order loads vs control-flow
> > > dependent loads as long as there is an isync after the conditional
branch.
> > > For this example, I believe the minimum synchronization requiredwould
be:
> > >
> > > > > > > P1: x = 1;
> > > > > > > P2: if (x == 1) lwsync(), y = 2;
> > > > > > > P3: if (y == 2) isync(), assert(x == 1);
> > >
> > > Note that P2 only needs an lwsync because it is ordering a load and a
> > > store,
> >
> > For mere ordering of load x and store y on P2, lwsync is not needed --
> > see "B.2.3 Safe Fetch" in Book II.
> >
> > > and P3 only needs an isync because we're relying on the control-flow
> > > dependence
> >
> > Well, lwarx-stwcx-on-P3 version aside for a moment, at least one
> > "cumulative" barrier is needed on P2/P3 (Book II shows two) I think.

On this particular example, cumulativity is only needed on P2 (lwsync
provides a cumulative ordering). Basically you need P1's x=1 store to be
ordered wrt P3's y=2. Cumulativity is not needed on P3: the order of the
loads on P3 is irrelevant for other threads.

Note that the ordering provided by safe_fetch (which is just an artificial
control-flow dependence) is not cumulative, so it is insufficient for P2. I
stand by my previous statement that the best code for this is using a
lwsync on P2 and an isync of P3.

> > Ordering and visibility on Power is rather murky, isn't it? ;-)

No, it's easy! (Famous last words :-)

> Consider also
>
> P1: x = 1;
> P2: if (x == 1) x = 1, eieio(), y = 2; // extra write to x plus
StoreStore
> P3: if (y == 2) isync(), assert(x == 1);

Well, this may be better than an lwsync (it probably depends on the
implementation) but to me it's cheating: this is inventing a store to x.

PS: In case other people are actually interested on these details, I want
to point out that the PPC architecture spec is freely available online at:
http://www-128.ibm.com/developerworks/eserver/articles/archguide.html

> regards,
> alexander.
>
> --
> cpp-threads mailing list
> cpp-threads at decadentplace.org.uk
> http://www.decadentplace.org.uk/cgi-bin/mailman/listinfo/cpp-threads




More information about the cpp-threads mailing list