[cpp-threads] Alternatives to SC

Raul Silvera rauls at ca.ibm.com
Tue Jan 16 12:08:30 GMT 2007


cpp-threads-bounces at decadentplace.org.uk wrote on 01/16/2007 05:42:54 AM:

> On 1/15/07, Paul E. McKenney <paulmck at linux.vnet.ibm.com> wrote:
> > On Mon, Jan 15, 2007 at 10:38:06AM -0800, Paul E. McKenney wrote:
> > > On Mon, Jan 15, 2007 at 07:57:52AM +0100, Alexander Terekhov wrote:
> > > > On 1/14/07, Raul Silvera <rauls at ca.ibm.com> wrote:
> > > > [...]
> > > > >Furthermore, as several other people have mentioned already,
> SC requires
> > > > >reads to wait for any writes observed from other threads to
> become globally
> > > > >visible. This means you need a StoreLoad barrier ...
> > > >
> > > > P1: x = 1;
> > > > P2: if (x == 1) y = 2;
> > > > P3: if (y == 2) assert(x == 1);
> > > >
> > > > PowerPC Book II discusses that example and says that it requires
> > > >
> > > > P1: x = 1;
> > > > P2: if (x == 1) sync(), y = 2;
> > > > P3: if (y == 2) sync(), assert(x == 1);
> > > >
> > > > ('Cumulative ordering' property of 'sync' instruction.)
> > > >
> > > > I somehow doubt that it will outperform a sync()-less version doing
> > > > load of x on P3 via stwcx-validated lwarx. ;-)
> > >
> > > Hard to say which would be faster.  Raul's version has the advantage
of
> > > permitting the caches to retain read sharing.  Your version has the
> > > advantage of getting rid of an expensive sync() instruction.
>
> Yep, nothing is free.
>
> > > If there
> > > are lots of P3 executions and very few P1 and P2 executions, I would
> > > guess that Raul's approach wins.
>
> Lots of P3 executions (on P3) alone won't be that bad.
>
> >
> > And besides, don't you also need at least an isync preceding the
assert()
> > on POWER for the the stwcx-validated lwarx to do what you want?
>
> I don't think so. Regarding isync see
>
>
http://www.decadentplace.org.uk/pipermail/cpp-threads/2005-May/000418.html
>
http://www.decadentplace.org.uk/pipermail/cpp-threads/2005-May/000421.html
>
http://www.decadentplace.org.uk/pipermail/cpp-threads/2005-May/000463.html
>
> And regarding stwcx-validated lwarx, Book II says that stwcx can
> detect "stale" values loaded by lwarx (it would be pretty useless
> otherwise).
>
> A sync()-less version with lwarx/stwcx loop on P3 relies on implicit
> ordering of stores by control dependencies (code conditional hoist
> store barrier implied by hardware) and ability of stwcx to detect
> "stale" values loaded by lwarx.

Oh, I understand what you're proposing now. You're relying on the control
flow dependence ordering the load of y and an invented stwcx x that you
would introduce with the same value you loaded from x.

As clever as that is, it is overkill (plus not general: for all P3 knows, x
could be on read-only memory). PPC will order loads vs control-flow
dependent loads as long as there is an isync after the conditional branch.
For this example, I believe the minimum synchronization required would be:

> > > > P1: x = 1;
> > > > P2: if (x == 1) lwsync(), y = 2;
> > > > P3: if (y == 2) isync(), assert(x == 1);

Note that P2 only needs an lwsync because it is ordering a load and a
store, and P3 only needs an isync because we're relying on the control-flow
dependence (otherwise we'd need an lwsync).

In any case, my original point was not so much about this example, but
about consecutive atomic loads. In the IRIW case, you need a sync between
the two loads on P3 and on P4. In general, you need a sync after every
atomic load if you want to achieve SC. That is a major overhead that would
need to be added to pretty much every atomic load in the program.

>
> regards,
> alexander.
>
> --
> cpp-threads mailing list
> cpp-threads at decadentplace.org.uk
> http://www.decadentplace.org.uk/cgi-bin/mailman/listinfo/cpp-threads




More information about the cpp-threads mailing list