[cpp-threads] Alternatives to SC

Chris Thomasson cristom at comcast.net
Sat Jan 20 07:53:47 GMT 2007


> On Wed, Jan 17, 2007 at 10:46:33PM -0800, Chris Thomasson wrote:
>>
>> >>On Wed, Jan 17, 2007 at 12:57:29PM -0800, Chris Thomasson wrote:
>> >>>
>> >>>>On Tue, Jan 16, 2007 at 05:05:16PM -0800, Chris Thomasson wrote:
>> >>>>>From: "Paul E. McKenney" <paulmck at linux.vnet.ibm.com>
>> >>>>>To: "C++ threads standardisation" <cpp-threads at decadentplace.org.uk>
>> >>>>>Sent: Tuesday, January 16, 2007 9:46 AM
>> >>>>>Subject: Re: [cpp-threads] Alternatives to SC
>> >>>>
>> >>>>Hello, Chris,
[...]
>> >>>>Out-of-order reads are allowed.  Out-of-order reads can occur
[...]
>> >>>>
>> >>>>There are also a number of Intel manuals containing the words
>> >>>>"Reads can be carried out speculatively and in any order".
[...]
>> >>>Well, does that mean that RCU should have lfence on x86? That would 
>> >>>tank
[...]
>> >>Absolutely not.  x86 respects data dependencies.
>>
>> Right, however, isn't this achieved by an "implied" #LoadLoad barrier for
>> every atomic load?
[...]
> That said, the implied barrier need only apply to the pair of loads
> involved in the data dependency.  So there is indeed an implied barrier,
> but its effect can be extremely limited.

Yes; I agree. Humm, I am wondering what the granularity-level is wrt the 
memory barrier that virtually has to be attached to basically any so-called 
'naked' atomic load on a current x86... Humm... Well, I could imagine a 
situation where the granularity might be 'too' coarse... IMO, TSO seems to 
imply a full #LoadStore | #LoadLoad is indeed attached to 'every' atomic 
load... Well, I wonder if they actually do that! So, I am very interested in 
reading what the Intel and AMD gurus have to teach me! Well, IMHO, when the 
documentation is not 'crystal clear' in an area like 'memory barrier 
functionality', well, people tend to form their 'own worlds' and live 
'happily' in fairly blissful ignorance... Please, Intel, AMD? Set me 
straight! ...

> My understanding of x86 microarchitecture is a bit dated, so I need to
> defer to the Intel and AMD people on this list for a definitive answer.

"Please, describe the total functionality of 'anything' which might be 
associated with an atomic load on the 'current' x86? that is analogous to a 
'memory barrier'?"

;^)




> For example:
>
> r0 = head;
> r1 = head->a;
> r2 = some_global_variable;
>
> Here, there has to be an implied LoadLoad between the load into r0 and
> the load into r1, but the load into r2 could potentially be hoisted
> above both preceding loads.

Yes. Well, its "kind of" similar to this...

Given The Scenario:

Thread 'A'
-------------------
 1: loc1.acquire();
 2: loc1.release();

Thread 'B'
-------------------
 1: loc2.acquire();
 2: loc2.release();


The following execution-order could legitimately be realized wrt the rules 
of acquire/release barriers themselves and POSIX for that matter:

A1
B1
A2
B2


Yikes! B1 deals with a different location than A2 does, so, B1 can be 
hoisted up above it and slammed directly into its critical-section.


> In contrast, an explicit barrier would
> affect the load into r2 as well as the load into r1.

Yup... BTW, I think I heard of so-called "tagged" memory barriers from Alex 
Terekhov a while back in comp.programming.threads... Not sure if it was 
something to help compiler's, or if an architecture did it... Anyway, you 
can could attach memory barrier functionality directly to a specific 
location. For instance, if you wanted to affect r2, and leave r1 alone 
because its barrier is already implies, well, you can do that... Something 
like:


<pseudo at&t assembler>

# sequence 1
1 - MOVE (%head), %r0
  1a - mb_head_0: membar #LoadLoad tagged to (1[%head]->%r0)

1b - MOVE (%r0), %r1
  1c - mb_head_1: membar #Naked tagged to (1a->%r1)

# sequence 2
2 - MOVE (%some_global_variable), %r2
  2a - mb_some_global_variable_0: # membar #LoadLoad tagged to 
(2[%some_global_variable]->%r2)

Sequence 1 (e.g., instructions 1,1a,1b and 1c) have nothing to do with 
sequence 2 (e.g., instructions 2 and/or 2a...) Therefore, granularity is 
fairly excellent in this case... 




More information about the cpp-threads mailing list