[cpp-threads] Alternatives to SC

Sat Jan 13 20:47:06 GMT 2007

On Sat, Jan 13, 2007 at 09:07:35AM -0500, Doug Lea wrote:
> Boehm, Hans wrote:
> >But I think that's only part of the
> >issue.  If we want more programmers to be able to write reasonably
> >correct multithreaded code, we need a consistent story that's easy to
> >teach.
> 
> As you know, we had exactly the same sentiment when
> standardizing the Java Memory Model. Exactly.
> I now think forcing SC-for-volatiles into this approach was
> an error. Here's why:
> 
> 1. It is possible that some perfectly fine existing Java VMs do not
> actually meet the SC-for-volatiles spec because of the IRIW
> requirements. This is an uncomfortable position: We don't actually
> know of any violations because the case is difficult to empirically
> test for, and no failures along these lines have ever been
> reported. And you can't go by published processor specs to determine
> conformance because processor implementations are allowed to be (and
> usually are) stronger than their own specs require. At the time
> of standardizing the JMM, vendors we contacted did not object to the
> spec, and the JCP EC (which includes many vendors) voted for approval.
> 
> But it remains the case that the majority of published
> processor specs do not appear to guarantee SC for volatiles when
> implemented in the recommended way. (All I know of do appear
> to guarantee CCCC, although it would be great if this were more
> explicit.)

Since I bear some guilt by association here, perhaps I should fill in
why x86's memory model is so vague.  This story began before I joined
Sequent, so even if I did help a little bit, I wasn't there when it
actually happened.  ;-)

Sequent (and no doubt others) started building 80386-based multiprocessors
in the late 1980s.  At this time, there was no reason for Intel to
specify any memory ordering constraints -- after all, they were selling
strictly single-CPU configurations, right?  And if these crazies at
Sequent (and no doubt elsewhere) were doing strange things with a
microscopically small fraction of Intel's CPUs in a totally irrelevant
niche market, so what?

Sequent chose a subset of process order -- all CPUs saw all writes from
a given CPU as having been performed in order.  In addition, all atomic
operations (initially, IIRC, all instructions that caused the CPU to
assert the LOCK signal) caused all preceding instructions from that CPU
to be globally visible before all subsequent instructions from that CPU.
Oh, and inter-processor interrupts could not be delivered to the target
CPU until all preceding writes from the source CPU were visible -- we had
all sorts of good clean fun with a prototype of hardware that violated
this constraint!  There were eventually constraints on DMA completion and
the associated device interrupts, but the earlier versions of hardware
did not provide these -- drivers had to do explicit DMA-flush operations.

There may have been other constraints, but that was all that us software
guys relied on.

Intel came out with dual Pentium and "glueless MP" Pentium Pro in the
mid-90s, and thus finally could dictate memory ordering -- but only for a
single local bus.  Vendors like Sequent, Data General, and Corellary all
had absolute control of what sort of memory-ordering constraints needed.
I would not be surprised to learn that these implementations differed in
their memory ordering -- the only requirement was to keep Windows happy
(keep in mind that Sequent could adjust DYNIX/ptx as needed).

Had Intel mandated a spec, any existing high-end vendor would have
been able to tell Intel where to stick it.  Intel's only possible
leverage would have been to get Microsoft to force the issue, which
apparently was not in the cards at the time.

So, here we are!  ;-)

Perhaps the time is right to force Intel and AMD to agree on something.
Maybe after their lawsuit is resolved?  ;-)

						Thanx, Paul

PS:  Doug does an excellent job of outlining many of my concerns with
     SC below.

> This is a time-bomb: As large multiprocessors get larger, eventually
> there will be one in which IRIW violations are easy to demonstrate. In
> which case we will need to either admit the spec was overly
> constraining, or decertify Java on such platforms. (Or, I suppose,
> require that they use CAS or locks for volatile-reads, which would
> make them unusable.) I'm recommending we revise the spec before this
> happens. It's not an urgent matter though, so we might as well take
> the time to do it right.  With the notable exceptions of those
> designing IA64- and Sparc-based systems, at least some architects
> involved with large systems (hundreds or thousands of processors) are
> opposed to requirements that the IRIW case be SC because of the kinds of
> scalability concerns Paul McKenney has described. So we will surely hit
> this problem sooner or later.
> 
> 2. In retrospect, aiming to completely equate lock-based consistency
> properties with those for volatiles (C++ strong-atomics) seems a bit
> misguided.  It ignores the most notable difference between lock-based
> and volatile-based constructions: Volatiles support concurrent
> readability. That's the main reason you use them! So it is not too
> surprising that when ordering rules are equated in all other respects,
> that this difference still shows through in some innocuous
> way (i.e., the IRIW case).
> 
> But because we didn't think the SC-for-volatiles issue was a big
> deal at the time, we didn't even try fleshing out alternative
> sub-models that would reveal the nature of these differences.
> (Doing so led to CCCC.)
> 
> 3. As good as the SC-for-volatiles story sounds to memory model
> specifiers, I think it has had essentially no impact on Java
> programmers.  The most important concept for people mixing volatiles
> and non-volatiles to understand is the happens-before relation.
> In fact, the chapter on JMM in our "Java Concurrency in Practice" book
> (http://jcip.net/) is almost completely about happens-before, and doesn't
> even explain sequential consistency. (Aside: Brian Goetz wrote that
> chapter, in part to ensure that I didn't fill it with irrelevant and
> impenetrable memory-model-ese :-) (Aside#2: The "-->" relation in
> CCCC might as well be pronounced "happens-before".)
> 
> And for those programmers who otherwise stay away from constructions
> using volatiles/atomics, it suffices to illustrate the small number of
> common recipes employing them (double-checked initialization, polling
> loops, atomic increments), without saying much beyond the fact that
> not using volatiles/atomics in these cases is wrong.
> 
> (Across all this, I suppose you can still even keep the SC slogan,
> if you are willing to add a footnote explaining the IRIW loophole. :-)
> 
> My main point here is that, even though I've been trying to stay out of
> policy judgement calls for C++ memory model, I think it is worth looking
> at our experience in Java. I do believe that if you adopt SC for
> strong atomics, you will someday be in the same position we are in now,
> of contemplating spec revisions
> 
> -Doug