[cpp-threads] Alternatives to SC
Paul E. McKenney
paulmck at linux.vnet.ibm.com
Sat Jan 20 19:52:47 GMT 2007
On Fri, Jan 19, 2007 at 11:53:47PM -0800, Chris Thomasson wrote:
>
> >On Wed, Jan 17, 2007 at 10:46:33PM -0800, Chris Thomasson wrote:
> >>
> >>>>On Wed, Jan 17, 2007 at 12:57:29PM -0800, Chris Thomasson wrote:
> >>>>>
> >>>>>>On Tue, Jan 16, 2007 at 05:05:16PM -0800, Chris Thomasson wrote:
> >>>>>>>From: "Paul E. McKenney" <paulmck at linux.vnet.ibm.com>
> >>>>>>>To: "C++ threads standardisation" <cpp-threads at decadentplace.org.uk>
> >>>>>>>Sent: Tuesday, January 16, 2007 9:46 AM
> >>>>>>>Subject: Re: [cpp-threads] Alternatives to SC
> >>>>>>
> >>>>>>Hello, Chris,
> [...]
> >>>>>>Out-of-order reads are allowed. Out-of-order reads can occur
> [...]
> >>>>>>
> >>>>>>There are also a number of Intel manuals containing the words
> >>>>>>"Reads can be carried out speculatively and in any order".
> [...]
> >>>>>Well, does that mean that RCU should have lfence on x86? That would
> >>>>>tank
> [...]
> >>>>Absolutely not. x86 respects data dependencies.
> >>
> >>Right, however, isn't this achieved by an "implied" #LoadLoad barrier for
> >>every atomic load?
> [...]
> >That said, the implied barrier need only apply to the pair of loads
> >involved in the data dependency. So there is indeed an implied barrier,
> >but its effect can be extremely limited.
>
> Yes; I agree. Humm, I am wondering what the granularity-level is wrt the
> memory barrier that virtually has to be attached to basically any so-called
> 'naked' atomic load on a current x86... Humm... Well, I could imagine a
> situation where the granularity might be 'too' coarse... IMO, TSO seems to
> imply a full #LoadStore | #LoadLoad is indeed attached to 'every' atomic
> load... Well, I wonder if they actually do that! So, I am very interested in
> reading what the Intel and AMD gurus have to teach me! Well, IMHO, when the
> documentation is not 'crystal clear' in an area like 'memory barrier
> functionality', well, people tend to form their 'own worlds' and live
> 'happily' in fairly blissful ignorance... Please, Intel, AMD? Set me
> straight! ...
As noted earlier, they had relatively little say. But I agree that
only they are now in a position to provide definition.
> >My understanding of x86 microarchitecture is a bit dated, so I need to
> >defer to the Intel and AMD people on this list for a definitive answer.
>
> "Please, describe the total functionality of 'anything' which might be
> associated with an atomic load on the 'current' x86? that is analogous to a
> 'memory barrier'?"
>
> ;^)
Careful what you wish for -- you just might get it. ;-)
> >For example:
> >
> >r0 = head;
> >r1 = head->a;
> >r2 = some_global_variable;
> >
> >Here, there has to be an implied LoadLoad between the load into r0 and
> >the load into r1, but the load into r2 could potentially be hoisted
> >above both preceding loads.
>
> Yes. Well, its "kind of" similar to this...
>
> Given The Scenario:
>
> Thread 'A'
> -------------------
> 1: loc1.acquire();
> 2: loc1.release();
>
> Thread 'B'
> -------------------
> 1: loc2.acquire();
> 2: loc2.release();
>
>
> The following execution-order could legitimately be realized wrt the rules
> of acquire/release barriers themselves and POSIX for that matter:
>
> A1
> B1
> A2
> B2
>
> Yikes! B1 deals with a different location than A2 does, so, B1 can be
> hoisted up above it and slammed directly into its critical-section.
Yep -- even in SC, there is nothing constraining the order of A1 and B1.
SC only guarantees that, for a given execution, that everyone will agree
on the arbitrarily chosen order.
> >In contrast, an explicit barrier would
> >affect the load into r2 as well as the load into r1.
>
> Yup... BTW, I think I heard of so-called "tagged" memory barriers from Alex
> Terekhov a while back in comp.programming.threads... Not sure if it was
> something to help compiler's, or if an architecture did it... Anyway, you
> can could attach memory barrier functionality directly to a specific
> location. For instance, if you wanted to affect r2, and leave r1 alone
> because its barrier is already implies, well, you can do that... Something
> like:
>
> <pseudo at&t assembler>
>
> # sequence 1
> 1 - MOVE (%head), %r0
> 1a - mb_head_0: membar #LoadLoad tagged to (1[%head]->%r0)
>
> 1b - MOVE (%r0), %r1
> 1c - mb_head_1: membar #Naked tagged to (1a->%r1)
>
> # sequence 2
> 2 - MOVE (%some_global_variable), %r2
> 2a - mb_some_global_variable_0: # membar #LoadLoad tagged to
> (2[%some_global_variable]->%r2)
>
> Sequence 1 (e.g., instructions 1,1a,1b and 1c) have nothing to do with
> sequence 2 (e.g., instructions 2 and/or 2a...) Therefore, granularity is
> fairly excellent in this case...
I have come across these. ;-) Interesting to play with, but not
clear to me that they will catch on.
Thanx, Paul
More information about the cpp-threads
mailing list