[cpp-threads] Alternatives to SC
Chris Thomasson
cristom at comcast.net
Sat Jan 20 07:53:47 GMT 2007
> On Wed, Jan 17, 2007 at 10:46:33PM -0800, Chris Thomasson wrote:
>>
>> >>On Wed, Jan 17, 2007 at 12:57:29PM -0800, Chris Thomasson wrote:
>> >>>
>> >>>>On Tue, Jan 16, 2007 at 05:05:16PM -0800, Chris Thomasson wrote:
>> >>>>>From: "Paul E. McKenney" <paulmck at linux.vnet.ibm.com>
>> >>>>>To: "C++ threads standardisation" <cpp-threads at decadentplace.org.uk>
>> >>>>>Sent: Tuesday, January 16, 2007 9:46 AM
>> >>>>>Subject: Re: [cpp-threads] Alternatives to SC
>> >>>>
>> >>>>Hello, Chris,
[...]
>> >>>>Out-of-order reads are allowed. Out-of-order reads can occur
[...]
>> >>>>
>> >>>>There are also a number of Intel manuals containing the words
>> >>>>"Reads can be carried out speculatively and in any order".
[...]
>> >>>Well, does that mean that RCU should have lfence on x86? That would
>> >>>tank
[...]
>> >>Absolutely not. x86 respects data dependencies.
>>
>> Right, however, isn't this achieved by an "implied" #LoadLoad barrier for
>> every atomic load?
[...]
> That said, the implied barrier need only apply to the pair of loads
> involved in the data dependency. So there is indeed an implied barrier,
> but its effect can be extremely limited.
Yes; I agree. Humm, I am wondering what the granularity-level is wrt the
memory barrier that virtually has to be attached to basically any so-called
'naked' atomic load on a current x86... Humm... Well, I could imagine a
situation where the granularity might be 'too' coarse... IMO, TSO seems to
imply a full #LoadStore | #LoadLoad is indeed attached to 'every' atomic
load... Well, I wonder if they actually do that! So, I am very interested in
reading what the Intel and AMD gurus have to teach me! Well, IMHO, when the
documentation is not 'crystal clear' in an area like 'memory barrier
functionality', well, people tend to form their 'own worlds' and live
'happily' in fairly blissful ignorance... Please, Intel, AMD? Set me
straight! ...
> My understanding of x86 microarchitecture is a bit dated, so I need to
> defer to the Intel and AMD people on this list for a definitive answer.
"Please, describe the total functionality of 'anything' which might be
associated with an atomic load on the 'current' x86? that is analogous to a
'memory barrier'?"
;^)
> For example:
>
> r0 = head;
> r1 = head->a;
> r2 = some_global_variable;
>
> Here, there has to be an implied LoadLoad between the load into r0 and
> the load into r1, but the load into r2 could potentially be hoisted
> above both preceding loads.
Yes. Well, its "kind of" similar to this...
Given The Scenario:
Thread 'A'
-------------------
1: loc1.acquire();
2: loc1.release();
Thread 'B'
-------------------
1: loc2.acquire();
2: loc2.release();
The following execution-order could legitimately be realized wrt the rules
of acquire/release barriers themselves and POSIX for that matter:
A1
B1
A2
B2
Yikes! B1 deals with a different location than A2 does, so, B1 can be
hoisted up above it and slammed directly into its critical-section.
> In contrast, an explicit barrier would
> affect the load into r2 as well as the load into r1.
Yup... BTW, I think I heard of so-called "tagged" memory barriers from Alex
Terekhov a while back in comp.programming.threads... Not sure if it was
something to help compiler's, or if an architecture did it... Anyway, you
can could attach memory barrier functionality directly to a specific
location. For instance, if you wanted to affect r2, and leave r1 alone
because its barrier is already implies, well, you can do that... Something
like:
<pseudo at&t assembler>
# sequence 1
1 - MOVE (%head), %r0
1a - mb_head_0: membar #LoadLoad tagged to (1[%head]->%r0)
1b - MOVE (%r0), %r1
1c - mb_head_1: membar #Naked tagged to (1a->%r1)
# sequence 2
2 - MOVE (%some_global_variable), %r2
2a - mb_some_global_variable_0: # membar #LoadLoad tagged to
(2[%some_global_variable]->%r2)
Sequence 1 (e.g., instructions 1,1a,1b and 1c) have nothing to do with
sequence 2 (e.g., instructions 2 and/or 2a...) Therefore, granularity is
fairly excellent in this case...
More information about the cpp-threads
mailing list