[cpp-threads] Alternatives to SC

Sun Jan 14 14:20:40 GMT 2007

Herb Sutter wrote:
> 
> 
> So now let me cast the question this way, which I think finally puts the finger on what's been bothering me:
> 
>   Is an atomic variable a monitor?
> 

No!

To explain, I first need an academic-sounding digression (sorry!) ...

Programming practices surrounding synchronization have historical
quirks because it took a while to figure out what the primitives
were (both for shared-memory and message-based constructs; here we
deal only with the former). In the early days of operating systems,
people used all sorts of ad-hoc constructions (most of which, luckily,
have since died off). In the 1970s and 1980s, dissatisfaction with
this state of affairs led Hoare, Brinch Hansen, and others to devise
various lock, semaphore, and monitor abstractions, which were
considered magically primitive. This led to the now-classic
"conservative" lock-oriented programming methodologies based on them.
It wasn't until the late 1980s and early 1990s that these were put
on better footing. Herlihy et al (see papers around 1990 at
http://www.cs.brown.edu/~mph/publications.html) proved that
compareAndSet (or LL/SC, or even fetchAndAdd) served as a "universal
primitive" on cache-coherent machines(*), from which you can create
locks, as well as various other non-lock-based constructions. (For an
under-appreciated paper by an engineer who pioneered some of these
techniques during this period, see Treiber's "Systems Programming:
Coping with Parallelism" 1986 IBM tech report available at
http://domino.research.ibm.com/library/cyberdig.nsf/index.html)

So, "atomic" variables (**) providing (ordered) reads, writes, and CAS
can be used as bases to support a number of programming styles. You
can create locks and monitors by using one as a lock-status variable;
spinning until you can CAS from 0 to 1, then updating, then writing 0.
You can tie these to OS-level scheduling primitives to create more
well-behaved blocking locks and monitors.  Or, you can make custom
optimistic controls in which, rather than first spinning to get a lock,
you use the atomic as data value, and spin in a (read; compute update;
try to CAS update) loop.  And all sorts of further variations like
speculative locks and transactions (***).

Footnotes:
   (*) You don't strictly need cache coherence, but if not, you are
       probably dealing with distributed systems, in which you want to
       use other kinds of consensus mechanics based on message passing.
       And cache coherence alone isn't enough to make CAS useful in
       practice. If platforms didn't also support causal consistency
       (luckily all do) then you'd need to manage some transitivity
       effects manually, which breaks encapsulation and would be
       extremely not-fun.

  (**) In Java, volatiles are "crippled" atomics in that they don't
       intrinsically support CAS. A language flaw. But you can get the
       ones that do using java.util.concurrent.atomicX, or even add CAS
       support to existing volatiles by attaching AtomicXFieldUpdaters
       to them. Too bad this is so ugly and introduces a bit of
       overhead.

(***) One of the attractions of "transactional" synchronization
       constructs is that they permit any of these kinds of
       implementations, modulo not allowing deadlock. One reason to be
       skeptical though is that it is challenging to pick
       the best (or even the non-worst!) solution for any situation.

(End of lecture :-)

In this light, it was a serious group mis-think to wish to claim that
atomics are intrinsically SC. (I was part of this mis-think; sorry!)

Atomics can be used to create SC constructions like locks.
But atomics can also be used in other ways that are not necessarily SC.
As it turns out, all usages on cache-coherent causally consistent
machines are at least nearly SC. On some platforms (like those
supporting TSO), "nearly" happens to be "exactly", but
there's no compelling reason to require this correspondence
to be exact, and on some machines it isn't. However, trying to use
atomics on non-cache-coherent-causally-consistent machines
is at best unproductive, so it is clear where the sweet spot for
a spec lies.

Finally back to the atomics vs monitors issue:
If you want monitors/locks, use them!
But they are not the same as atomics. The main practical
difference being that atomics are concurrently readable.

One reason to want locks is to ensure statically verifiable SC,
at the typical cost of not being able to statically ensure lack
of deadlock. SC programs can additionally optionally employ various
atomic-based constructions that have been proven to maintain SC.

But C++ and Java are broad-spectrum languages, in which some
developers need to use the base atomic primitives to provide more
scalable or efficient constructions (that when designed well,
typically preserve SC when used by application-level programmers).
If the language doesn't provide them, then these systems and
library developers can't do their jobs. And if the language
cripples these constructs by requiring unnecessary properties,
then these developers are not much better off.

As always, I don't want to sound like I'm against simplicity.
I think locking is a great thing to teach programmers.
It just doesn't suffice for all needs. (I sometimes
introduce my concurrency course saying it has two parts.
Part 1: Always use locks; Part 2: Never use locks. :-)

-Doug