[cpp-threads] Alternatives to SC

Wed Jan 17 23:50:10 GMT 2007

On Wed, Jan 17, 2007 at 12:57:29PM -0800, Chris Thomasson wrote:
> 
> >On Tue, Jan 16, 2007 at 05:05:16PM -0800, Chris Thomasson wrote:
> >>From: "Paul E. McKenney" <paulmck at linux.vnet.ibm.com>
> >>To: "C++ threads standardisation" <cpp-threads at decadentplace.org.uk>
> >>Sent: Tuesday, January 16, 2007 9:46 AM
> >>Subject: Re: [cpp-threads] Alternatives to SC
> >
> >Hello, Chris,
> >
> >>[...]
> [...]
> >>
> >>Here is a trick you can do on the x86:
> [...]
> 
> >The above is indeed true on a number of x86 implementations, but is
> >-not- reflected in the architecture.  See for example the AMD x86-64
> >Architecture Programmer's Manual Volume 2 (System Programming),
> >24593-Rev.3.07-Sep-2002, page 195, first bullet:
> >
> >Out-of-order reads are allowed.  Out-of-order reads can occur
> >as a result of out-of-order execution or speculative execution.
> >The processor can read memory out-of-order to allow out-of-order
> >execution to proceed.
> >
> >There are also a number of Intel manuals containing the words
> >"Reads can be carried out speculatively and in any order".
> >
> >So there is no guarantee of #LoadLoad across all x86 implementations,
> >even if a number of very popular x86 implementations do in fact provide
> >this guarantee.
> 
> Well, does that mean that RCU should have lfence on x86? That would tank its
> performance, unless you used that "batching" algorithm I mentioned in a
> previous post to this group. I hate it when the architecture manual does not
> "explicitly and clearly " detail its memory model... SPARC documents are
> really good in this respect, Intel is really, well, not so good here...

Absolutely not.  x86 respects data dependencies.  See my previous
email -- zero added instructions compared to single-threaded code
for x86 to RCU readers.  RCU list insertions do require StoreStore,
but readers require nothing.

> ;^)

Before you beat Intel up too badly on this topic, you might want to
consider the history.

In the very late 1980s, Sequent decided to switch from National
Semiconductor CPUs (remember those?) to Intel 80386 CPUs (remember
those?).  Now, Intel never intended the 80386 to be used in multiprocessor
configurations (and Linux in fact does not support 80386 SMP systems),
so Sequent (and no doubt a few others at about that same time) had the
privilege of defining the 80386 memory-ordering model.  Given that there
was an extremely large Intel contingent within easy bicycle distance of
Sequent headquarters, I would guess that Intel might have been consulted.
But the fact remains that Intel had no real control.  And probably did
not care all that much about a system vendor that sold a vanishingly
small fraction of the extremely popular 80386 CPUs.

(And you cannot blame me for this situation, as I did not join Sequent
until somewhat later.  So even if I did help a little bit, I wasn't
there when it actually happened.  So there!!!)

Sequent chose a process-order model, where each CPU's stores are seen in
order from other CPUs, with no guarantees on ordering of loads.  This was
refined over the years.  For example, did you ever try working on a system
where the I/O completion interrupt can be received by a CPU before the
DMA reaches the coherence domain?  Or where an interprocessor interrupt
can be received before the sending CPU's prior writes have reached the
coherence domain?  Such charming idiosyncracies were repaired over time.

Intel came out with dual-Pentium systems some years later, and the
4-CPU "glueless MP" Pentium Pro in the mid-1990s.  At this point,
Intel did control memory ordering, but only for smaller systems with a
single front-side bus.  Sequent, Data General, Coherent, and no doubt
others defined the semantics for larger systems (up to 64 CPUs in the
late 1990s).

At this point, Intel -still- did not have full control over the memory
model for x86.

And you know what?  They still do not.  For one thing, there is AMD.
For another, vendors still create NUMA systems out of x86 building
blocks, and therefore control the inter-node memory ordering.

So I am hoping that Intel, AMD, and the various x86/NUMA vendors will
be able to agree on something like Doug Lea's CCCC proposal.

							Thanx, Paul