[cpp-threads] Alternatives to SC

Sat Jan 20 19:52:47 GMT 2007

On Fri, Jan 19, 2007 at 11:53:47PM -0800, Chris Thomasson wrote:
> 
> >On Wed, Jan 17, 2007 at 10:46:33PM -0800, Chris Thomasson wrote:
> >>
> >>>>On Wed, Jan 17, 2007 at 12:57:29PM -0800, Chris Thomasson wrote:
> >>>>>
> >>>>>>On Tue, Jan 16, 2007 at 05:05:16PM -0800, Chris Thomasson wrote:
> >>>>>>>From: "Paul E. McKenney" <paulmck at linux.vnet.ibm.com>
> >>>>>>>To: "C++ threads standardisation" <cpp-threads at decadentplace.org.uk>
> >>>>>>>Sent: Tuesday, January 16, 2007 9:46 AM
> >>>>>>>Subject: Re: [cpp-threads] Alternatives to SC
> >>>>>>
> >>>>>>Hello, Chris,
> [...]
> >>>>>>Out-of-order reads are allowed.  Out-of-order reads can occur
> [...]
> >>>>>>
> >>>>>>There are also a number of Intel manuals containing the words
> >>>>>>"Reads can be carried out speculatively and in any order".
> [...]
> >>>>>Well, does that mean that RCU should have lfence on x86? That would
> >>>>>tank
> [...]
> >>>>Absolutely not.  x86 respects data dependencies.
> >>
> >>Right, however, isn't this achieved by an "implied" #LoadLoad barrier for
> >>every atomic load?
> [...]
> >That said, the implied barrier need only apply to the pair of loads
> >involved in the data dependency.  So there is indeed an implied barrier,
> >but its effect can be extremely limited.
> 
> Yes; I agree. Humm, I am wondering what the granularity-level is wrt the
> memory barrier that virtually has to be attached to basically any so-called
> 'naked' atomic load on a current x86... Humm... Well, I could imagine a
> situation where the granularity might be 'too' coarse... IMO, TSO seems to
> imply a full #LoadStore | #LoadLoad is indeed attached to 'every' atomic
> load... Well, I wonder if they actually do that! So, I am very interested in
> reading what the Intel and AMD gurus have to teach me! Well, IMHO, when the
> documentation is not 'crystal clear' in an area like 'memory barrier
> functionality', well, people tend to form their 'own worlds' and live
> 'happily' in fairly blissful ignorance... Please, Intel, AMD? Set me
> straight! ...

As noted earlier, they had relatively little say.  But I agree that
only they are now in a position to provide definition.

> >My understanding of x86 microarchitecture is a bit dated, so I need to
> >defer to the Intel and AMD people on this list for a definitive answer.
> 
> "Please, describe the total functionality of 'anything' which might be
> associated with an atomic load on the 'current' x86? that is analogous to a
> 'memory barrier'?"
> 
> ;^)

Careful what you wish for -- you just might get it.  ;-)

> >For example:
> >
> >r0 = head;
> >r1 = head->a;
> >r2 = some_global_variable;
> >
> >Here, there has to be an implied LoadLoad between the load into r0 and
> >the load into r1, but the load into r2 could potentially be hoisted
> >above both preceding loads.
> 
> Yes. Well, its "kind of" similar to this...
> 
> Given The Scenario:
> 
> Thread 'A'
> -------------------
> 1: loc1.acquire();
> 2: loc1.release();
> 
> Thread 'B'
> -------------------
> 1: loc2.acquire();
> 2: loc2.release();
> 
> 
> The following execution-order could legitimately be realized wrt the rules
> of acquire/release barriers themselves and POSIX for that matter:
> 
> A1
> B1
> A2
> B2
> 
> Yikes! B1 deals with a different location than A2 does, so, B1 can be
> hoisted up above it and slammed directly into its critical-section.

Yep -- even in SC, there is nothing constraining the order of A1 and B1.
SC only guarantees that, for a given execution, that everyone will agree
on the arbitrarily chosen order.

> >In contrast, an explicit barrier would
> >affect the load into r2 as well as the load into r1.
> 
> Yup... BTW, I think I heard of so-called "tagged" memory barriers from Alex
> Terekhov a while back in comp.programming.threads... Not sure if it was
> something to help compiler's, or if an architecture did it... Anyway, you
> can could attach memory barrier functionality directly to a specific
> location. For instance, if you wanted to affect r2, and leave r1 alone
> because its barrier is already implies, well, you can do that... Something
> like:
> 
> <pseudo at&t assembler>
> 
> # sequence 1
> 1 - MOVE (%head), %r0
>  1a - mb_head_0: membar #LoadLoad tagged to (1[%head]->%r0)
> 
> 1b - MOVE (%r0), %r1
>  1c - mb_head_1: membar #Naked tagged to (1a->%r1)
> 
> # sequence 2
> 2 - MOVE (%some_global_variable), %r2
>  2a - mb_some_global_variable_0: # membar #LoadLoad tagged to
> (2[%some_global_variable]->%r2)
> 
> Sequence 1 (e.g., instructions 1,1a,1b and 1c) have nothing to do with
> sequence 2 (e.g., instructions 2 and/or 2a...) Therefore, granularity is
> fairly excellent in this case...

I have come across these.  ;-)  Interesting to play with, but not
clear to me that they will catch on.

						Thanx, Paul