[cpp-threads] Alternatives to SC

Paul E. McKenney paulmck at linux.vnet.ibm.com
Fri Jan 19 01:02:24 GMT 2007


On Wed, Jan 17, 2007 at 10:46:33PM -0800, Chris Thomasson wrote:
> 
> >>On Wed, Jan 17, 2007 at 12:57:29PM -0800, Chris Thomasson wrote:
> >>>
> >>>>On Tue, Jan 16, 2007 at 05:05:16PM -0800, Chris Thomasson wrote:
> >>>>>From: "Paul E. McKenney" <paulmck at linux.vnet.ibm.com>
> >>>>>To: "C++ threads standardisation" <cpp-threads at decadentplace.org.uk>
> >>>>>Sent: Tuesday, January 16, 2007 9:46 AM
> >>>>>Subject: Re: [cpp-threads] Alternatives to SC
> >>>>
> >>>>Hello, Chris,
> [...]
> >>>>Out-of-order reads are allowed.  Out-of-order reads can occur
> [...]
> >>>>
> >>>>There are also a number of Intel manuals containing the words
> >>>>"Reads can be carried out speculatively and in any order".
> >>>>
> >>>>So there is no guarantee of #LoadLoad across all x86 implementations,
> >>>>even if a number of very popular x86 implementations do in fact provide
> >>>>this guarantee.
> >>>
> >>>Well, does that mean that RCU should have lfence on x86? That would tank
> >>>its
> >>>performance, unless you used that "batching" algorithm I mentioned in a
> >>>previous post to this group. I hate it when the architecture manual does
> >>>not
> >>>"explicitly and clearly " detail its memory model... SPARC documents are
> >>>really good in this respect, Intel is really, well, not so good here...
> >>
> >>Absolutely not.  x86 respects data dependencies.
> 
> Right, however, isn't this achieved by an "implied" #LoadLoad barrier for
> every atomic load?  If the barrier was not implied, then RCU would almost
> have to use an explicit barrier call for readers.

My understanding of x86 microarchitecture is a bit dated, so I need to
defer to the Intel and AMD people on this list for a definitive answer.
That said, the implied barrier need only apply to the pair of loads
involved in the data dependency.  So there is indeed an implied barrier,
but its effect can be extremely limited.

For example:

	r0 = head;
	r1 = head->a;
	r2 = some_global_variable;

Here, there has to be an implied LoadLoad between the load into r0 and
the load into r1, but the load into r2 could potentially be hoisted
above both preceding loads.  In contrast, an explicit barrier would
affect the load into r2 as well as the load into r1.

							Thanx, Paul



More information about the cpp-threads mailing list