Subject: Re: [cpp-threads] Alternatives to SC

Sat Jan 27 00:36:29 GMT 2007

On Fri, Jan 19, 2007 at 11:53:47PM -0800, Chris Thomasson wrote:
>
> >On Wed, Jan 17, 2007 at 10:46:33PM -0800, Chris Thomasson wrote:
> >>
> >>>>On Wed, Jan 17, 2007 at 12:57:29PM -0800, Chris Thomasson wrote:
> >>>>>
> >>>>>>On Tue, Jan 16, 2007 at 05:05:16PM -0800, Chris Thomasson wrote:
> >>>>>>>From: "Paul E. McKenney" <paulmck at linux.vnet.ibm.com>
> >>>>>>>To: "C++ threads standardisation" <cpp-threads at 
> >>>>>>>decadentplace.org.uk>
> >>>>>>>Sent: Tuesday, January 16, 2007 9:46 AM
> >>>>>>>Subject: Re: [cpp-threads] Alternatives to SC
> >>>>>>
> >>>>>>Hello, Chris,
> [...]
> >>>>>>Out-of-order reads are allowed.  Out-of-order reads can occur
> [...]
> >>>>>>
> >>>>>>There are also a number of Intel manuals containing the words
> >>>>>>"Reads can be carried out speculatively and in any order".
> [...]
> >>>>>Well, does that mean that RCU should have lfence on x86? That would
> >>>>>tank
> [...]
> >>>>Absolutely not.  x86 respects data dependencies.
> >>
> >>Right, however, isn't this achieved by an "implied" #LoadLoad barrier 
> >>for
> >>every atomic load?
> [...]
> >That said, the implied barrier need only apply to the pair of loads
> >involved in the data dependency.  So there is indeed an implied barrier,
> >but its effect can be extremely limited.
>
> Yes; I agree. Humm, I am wondering what the granularity-level is wrt the
> memory barrier that virtually has to be attached to basically any 
> so-called
> 'naked' atomic load on a current x86... [...]
> Please, Intel, AMD? Set me straight! ...
>> As noted earlier, they had relatively little say.  But I agree that
>> only they are now in a position to provide definition.

Yup. It would be nice to read some explicit guarantees. I would model the 
documentation after some of the Sun documents for the UltraSPARC T1. Its 
TSO, and the docs are very clear about that fact...

> >My understanding of x86 microarchitecture is a bit dated, so I need to
> >defer to the Intel and AMD people on this list for a definitive answer.
>
> "Please, describe the total functionality of 'anything' which might be
> associated with an atomic load on the 'current' x86? that is analogous to 
> a
> 'memory barrier'?"
>
> ;^)
>> Careful what you wish for -- you just might get it.  ;-)

:O

> >For example:
> >
> >r0 = head;
> >r1 = head->a;
> >r2 = some_global_variable;
> >
> >Here, there has to be an implied LoadLoad between the load into r0 and
> >the load into r1, but the load into r2 could potentially be hoisted
> >above both preceding loads.
>
> Yes. Well, its "kind of" similar to this...
>
> Given The Scenario:
[...]
> A1
> B1
> A2
> B2
>
> Yikes! B1 deals with a different location than A2 does, so, B1 can be
> hoisted up above it and slammed directly into its critical-section.

>> Yep -- even in SC, there is nothing constraining the order of A1 and B1.
>> SC only guarantees that, for a given execution, that everyone will agree
>> on the arbitrarily chosen order.

SC is ridiculously expensive. You have to use full memory barriers instead 
of classic release barrier... IMHO, its basically impossible to scale when 
you are forced to used something like SC. Reminds me of some of the 
overheads in software transactional memory. The price it pays for 
"ease-of-use" is tremendous. Any memory model that demands SC is way to 
expensive...

> >In contrast, an explicit barrier would
> >affect the load into r2 as well as the load into r1.
>
> Yup... BTW, I think I heard of so-called "tagged" memory barriers from 
> Alex
> Terekhov a while back in comp.programming.threads... Not sure if it was
> something to help compiler's, or if an architecture did it... Anyway, you
> can could attach memory barrier functionality directly to a specific
> location.

 [...]

>> I have come across these.  ;-)  Interesting to play with,

Indeed.

>> but not clear to me that they will catch on.

:^) Well, they would scale nicely...