[cpp-threads] Alternatives to SC

Alexander Terekhov alexander.terekhov at gmail.com
Tue Jan 16 10:42:54 GMT 2007


On 1/15/07, Paul E. McKenney <paulmck at linux.vnet.ibm.com> wrote:
> On Mon, Jan 15, 2007 at 10:38:06AM -0800, Paul E. McKenney wrote:
> > On Mon, Jan 15, 2007 at 07:57:52AM +0100, Alexander Terekhov wrote:
> > > On 1/14/07, Raul Silvera <rauls at ca.ibm.com> wrote:
> > > [...]
> > > >Furthermore, as several other people have mentioned already, SC requires
> > > >reads to wait for any writes observed from other threads to become globally
> > > >visible. This means you need a StoreLoad barrier ...
> > >
> > > P1: x = 1;
> > > P2: if (x == 1) y = 2;
> > > P3: if (y == 2) assert(x == 1);
> > >
> > > PowerPC Book II discusses that example and says that it requires
> > >
> > > P1: x = 1;
> > > P2: if (x == 1) sync(), y = 2;
> > > P3: if (y == 2) sync(), assert(x == 1);
> > >
> > > ('Cumulative ordering' property of 'sync' instruction.)
> > >
> > > I somehow doubt that it will outperform a sync()-less version doing
> > > load of x on P3 via stwcx-validated lwarx. ;-)
> >
> > Hard to say which would be faster.  Raul's version has the advantage of
> > permitting the caches to retain read sharing.  Your version has the
> > advantage of getting rid of an expensive sync() instruction.

Yep, nothing is free.

> > If there
> > are lots of P3 executions and very few P1 and P2 executions, I would
> > guess that Raul's approach wins.

Lots of P3 executions (on P3) alone won't be that bad.

>
> And besides, don't you also need at least an isync preceding the assert()
> on POWER for the the stwcx-validated lwarx to do what you want?

I don't think so. Regarding isync see

http://www.decadentplace.org.uk/pipermail/cpp-threads/2005-May/000418.html
http://www.decadentplace.org.uk/pipermail/cpp-threads/2005-May/000421.html
http://www.decadentplace.org.uk/pipermail/cpp-threads/2005-May/000463.html

And regarding stwcx-validated lwarx, Book II says that stwcx can
detect "stale" values loaded by lwarx (it would be pretty useless
otherwise).

A sync()-less version with lwarx/stwcx loop on P3 relies on implicit
ordering of stores by control dependencies (code conditional hoist
store barrier implied by hardware) and ability of stwcx to detect
"stale" values loaded by lwarx.

regards,
alexander.



More information about the cpp-threads mailing list