Subject: Re: [cpp-threads] Alternatives to SC
Paul E. McKenney
paulmck at linux.vnet.ibm.com
Sat Jan 27 02:19:30 GMT 2007
On Fri, Jan 26, 2007 at 04:36:29PM -0800, Chris Thomasson wrote:
> On Fri, Jan 19, 2007 at 11:53:47PM -0800, Chris Thomasson wrote:
> >
> >>On Wed, Jan 17, 2007 at 10:46:33PM -0800, Chris Thomasson wrote:
> >>>
> >>>>>On Wed, Jan 17, 2007 at 12:57:29PM -0800, Chris Thomasson wrote:
> >>>>>>
> >>>>>>>On Tue, Jan 16, 2007 at 05:05:16PM -0800, Chris Thomasson wrote:
> >>>>>>>>From: "Paul E. McKenney" <paulmck at linux.vnet.ibm.com>
> >>>>>>>>To: "C++ threads standardisation" <cpp-threads at
> >>>>>>>>decadentplace.org.uk>
> >>>>>>>>Sent: Tuesday, January 16, 2007 9:46 AM
> >>>>>>>>Subject: Re: [cpp-threads] Alternatives to SC
> >>>>>>>
> >>>>>>>Hello, Chris,
> >[...]
> >>>>>>>Out-of-order reads are allowed. Out-of-order reads can occur
> >[...]
> >>>>>>>
> >>>>>>>There are also a number of Intel manuals containing the words
> >>>>>>>"Reads can be carried out speculatively and in any order".
> >[...]
> >>>>>>Well, does that mean that RCU should have lfence on x86? That would
> >>>>>>tank
> >[...]
> >>>>>Absolutely not. x86 respects data dependencies.
> >>>
> >>>Right, however, isn't this achieved by an "implied" #LoadLoad barrier
> >>>for
> >>>every atomic load?
> >[...]
> >>That said, the implied barrier need only apply to the pair of loads
> >>involved in the data dependency. So there is indeed an implied barrier,
> >>but its effect can be extremely limited.
> >
> >Yes; I agree. Humm, I am wondering what the granularity-level is wrt the
> >memory barrier that virtually has to be attached to basically any
> >so-called
> >'naked' atomic load on a current x86... [...]
> >Please, Intel, AMD? Set me straight! ...
> >>As noted earlier, they had relatively little say. But I agree that
> >>only they are now in a position to provide definition.
>
> Yup. It would be nice to read some explicit guarantees. I would model the
> documentation after some of the Sun documents for the UltraSPARC T1. Its
> TSO, and the docs are very clear about that fact...
I will let you work that out with the Intel/AMD folks. Me, I would be
happy with them coming to agreement regardless of documentation format. ;-)
> >>My understanding of x86 microarchitecture is a bit dated, so I need to
> >>defer to the Intel and AMD people on this list for a definitive answer.
> >
> >"Please, describe the total functionality of 'anything' which might be
> >associated with an atomic load on the 'current' x86? that is analogous to
> >a
> >'memory barrier'?"
> >
> >;^)
> >>Careful what you wish for -- you just might get it. ;-)
>
> :O
>
>
> >>For example:
> >>
> >>r0 = head;
> >>r1 = head->a;
> >>r2 = some_global_variable;
> >>
> >>Here, there has to be an implied LoadLoad between the load into r0 and
> >>the load into r1, but the load into r2 could potentially be hoisted
> >>above both preceding loads.
> >
> >Yes. Well, its "kind of" similar to this...
> >
> >Given The Scenario:
> [...]
> >A1
> >B1
> >A2
> >B2
> >
> >Yikes! B1 deals with a different location than A2 does, so, B1 can be
> >hoisted up above it and slammed directly into its critical-section.
>
> >>Yep -- even in SC, there is nothing constraining the order of A1 and B1.
> >>SC only guarantees that, for a given execution, that everyone will agree
> >>on the arbitrarily chosen order.
>
> SC is ridiculously expensive. You have to use full memory barriers instead
> of classic release barrier... IMHO, its basically impossible to scale when
> you are forced to used something like SC. Reminds me of some of the
> overheads in software transactional memory. The price it pays for
> "ease-of-use" is tremendous. Any memory model that demands SC is way to
> expensive...
TSO is about as far as I can see going for normal loads and stores.
And, as noted earlier, we really do need to allow common idioms to
be used.
> >>In contrast, an explicit barrier would
> >>affect the load into r2 as well as the load into r1.
> >
> >Yup... BTW, I think I heard of so-called "tagged" memory barriers from
> >Alex
> >Terekhov a while back in comp.programming.threads... Not sure if it was
> >something to help compiler's, or if an architecture did it... Anyway, you
> >can could attach memory barrier functionality directly to a specific
> >location.
>
> [...]
>
> >>I have come across these. ;-) Interesting to play with,
>
> Indeed.
>
> >>but not clear to me that they will catch on.
>
> :^) Well, they would scale nicely...
Although expressing them in face of arbitrary control-flow changes
could be quite "interesting". For example, what do you do in the
following situation when the then-clause is not executed?
a = 1; l1:
tagged_mb(l1, l2);
if (random() & 0xff) {
l2: b = 1;
}
OK, how about this example?
l1: a = 1;
directed_mb(l1, l2);
if (random() & 0xff) {
l2: b = 1;
}
if (random() & 0xff) {
do_lots_of_stuff();
goto l2;
}
OK, so that goto is ugly anyway. So, how about this one?
for (i = 0; i < MAX_COUNT; i++) {
if (random() & 0xff) {
a = entry[i]; l1:
}
tagged_mb(l1, l2);
if (random() & 0xff) {
l2: b = entry[i];
}
}
Or am I misundertanding what you mean by tagged memory barriers?
Thanx, Paul
More information about the cpp-threads
mailing list