Subject: Re: [cpp-threads] Alternatives to SC

Sat Jan 27 02:19:30 GMT 2007

On Fri, Jan 26, 2007 at 04:36:29PM -0800, Chris Thomasson wrote:
> On Fri, Jan 19, 2007 at 11:53:47PM -0800, Chris Thomasson wrote:
> >
> >>On Wed, Jan 17, 2007 at 10:46:33PM -0800, Chris Thomasson wrote:
> >>>
> >>>>>On Wed, Jan 17, 2007 at 12:57:29PM -0800, Chris Thomasson wrote:
> >>>>>>
> >>>>>>>On Tue, Jan 16, 2007 at 05:05:16PM -0800, Chris Thomasson wrote:
> >>>>>>>>From: "Paul E. McKenney" <paulmck at linux.vnet.ibm.com>
> >>>>>>>>To: "C++ threads standardisation" <cpp-threads at 
> >>>>>>>>decadentplace.org.uk>
> >>>>>>>>Sent: Tuesday, January 16, 2007 9:46 AM
> >>>>>>>>Subject: Re: [cpp-threads] Alternatives to SC
> >>>>>>>
> >>>>>>>Hello, Chris,
> >[...]
> >>>>>>>Out-of-order reads are allowed.  Out-of-order reads can occur
> >[...]
> >>>>>>>
> >>>>>>>There are also a number of Intel manuals containing the words
> >>>>>>>"Reads can be carried out speculatively and in any order".
> >[...]
> >>>>>>Well, does that mean that RCU should have lfence on x86? That would
> >>>>>>tank
> >[...]
> >>>>>Absolutely not.  x86 respects data dependencies.
> >>>
> >>>Right, however, isn't this achieved by an "implied" #LoadLoad barrier 
> >>>for
> >>>every atomic load?
> >[...]
> >>That said, the implied barrier need only apply to the pair of loads
> >>involved in the data dependency.  So there is indeed an implied barrier,
> >>but its effect can be extremely limited.
> >
> >Yes; I agree. Humm, I am wondering what the granularity-level is wrt the
> >memory barrier that virtually has to be attached to basically any 
> >so-called
> >'naked' atomic load on a current x86... [...]
> >Please, Intel, AMD? Set me straight! ...
> >>As noted earlier, they had relatively little say.  But I agree that
> >>only they are now in a position to provide definition.
> 
> Yup. It would be nice to read some explicit guarantees. I would model the 
> documentation after some of the Sun documents for the UltraSPARC T1. Its 
> TSO, and the docs are very clear about that fact...

I will let you work that out with the Intel/AMD folks.  Me, I would be
happy with them coming to agreement regardless of documentation format.  ;-)

> >>My understanding of x86 microarchitecture is a bit dated, so I need to
> >>defer to the Intel and AMD people on this list for a definitive answer.
> >
> >"Please, describe the total functionality of 'anything' which might be
> >associated with an atomic load on the 'current' x86? that is analogous to 
> >a
> >'memory barrier'?"
> >
> >;^)
> >>Careful what you wish for -- you just might get it.  ;-)
> 
> :O
> 
> 
> >>For example:
> >>
> >>r0 = head;
> >>r1 = head->a;
> >>r2 = some_global_variable;
> >>
> >>Here, there has to be an implied LoadLoad between the load into r0 and
> >>the load into r1, but the load into r2 could potentially be hoisted
> >>above both preceding loads.
> >
> >Yes. Well, its "kind of" similar to this...
> >
> >Given The Scenario:
> [...]
> >A1
> >B1
> >A2
> >B2
> >
> >Yikes! B1 deals with a different location than A2 does, so, B1 can be
> >hoisted up above it and slammed directly into its critical-section.
> 
> >>Yep -- even in SC, there is nothing constraining the order of A1 and B1.
> >>SC only guarantees that, for a given execution, that everyone will agree
> >>on the arbitrarily chosen order.
> 
> SC is ridiculously expensive. You have to use full memory barriers instead 
> of classic release barrier... IMHO, its basically impossible to scale when 
> you are forced to used something like SC. Reminds me of some of the 
> overheads in software transactional memory. The price it pays for 
> "ease-of-use" is tremendous. Any memory model that demands SC is way to 
> expensive...

TSO is about as far as I can see going for normal loads and stores.
And, as noted earlier, we really do need to allow common idioms to
be used.

> >>In contrast, an explicit barrier would
> >>affect the load into r2 as well as the load into r1.
> >
> >Yup... BTW, I think I heard of so-called "tagged" memory barriers from 
> >Alex
> >Terekhov a while back in comp.programming.threads... Not sure if it was
> >something to help compiler's, or if an architecture did it... Anyway, you
> >can could attach memory barrier functionality directly to a specific
> >location.
> 
> [...]
> 
> >>I have come across these.  ;-)  Interesting to play with,
> 
> Indeed.
> 
> >>but not clear to me that they will catch on.
> 
> :^) Well, they would scale nicely... 

Although expressing them in face of arbitrary control-flow changes
could be quite "interesting".  For example, what do you do in the
following situation when the then-clause is not executed?

	a = 1; l1:
	tagged_mb(l1, l2);
	if (random() & 0xff) {
		l2: b = 1;
	}

OK, how about this example?

	l1: a = 1;
	directed_mb(l1, l2);
	if (random() & 0xff) {
		l2: b = 1;
	}
	if (random() & 0xff) {
		do_lots_of_stuff();
		goto l2;
	}

OK, so that goto is ugly anyway.  So, how about this one?

	for (i = 0; i < MAX_COUNT; i++) {
		if (random() & 0xff) {
			a = entry[i]; l1:
		}
		tagged_mb(l1, l2);
		if (random() & 0xff) {
			l2: b = entry[i];
		}
	}

Or am I misundertanding what you mean by tagged memory barriers?

							Thanx, Paul