[cpp-threads] RE: volatile, memory models, threads

Wed Mar 1 18:49:43 GMT 2006

> -----Original Message-----
> From: Nick Maclaren
> > On Alpha, those don't ensure ordering, which I think was 
> the issue here.
> > But Alpha has always had an MB (memory barrier) instruction 
> to do so.
> > Lock implementations need to use both.  To simply ensure memory 
> > ordering, MB suffices.
> 
> For atomic objects, it needs at least one (perhaps two) per 
> access, and can handle only a single 8-byte object (in later 
> Alphas, 1, 2, 4 and 8-byte).
Let's separate atomicity and ordering here.  I believe on Alpha an
8-byte write is always atomic, i.e. you can never see it half written.
An 8 byte atomic load with acquire semantics (effectively a Java
volatile read, or a C++/IA64 volatile read) takes a load followed by an
MB instruction.  An atomic counter increment takes an LL/SC pair, but no
MB instructions, if you don't care about ordering.

> ...
> > > That doesn't work, I am afraid.  Random users deadlocking the 
> > > system, if you are lucky.
> > 
> > I oversimplified a bit.  You use restartable critical 
> sections, i.e. 
> > you let the user process specify that if you are preempted 
> at certain 
> > locations, you should be restarted at a slightly different 
> instruction.
> 
> Sorry, but that only nibbles at the edge of the issue.  It 
> doesn't help in general.  Remember that the wrong scheduling 
> strategy can cause a correct program to slow down by a factor 
> exponential in the number of processors.
We're talking about uniprocessors here.  Exponential in the number of
processors is fine :-).

More seriously, this is a fairly well-known technique, both for
uniprocessors, and for accessing per-processor state on multiprocessors.
Instructions are executed more than twice only if you reschedule the
same processor twice within the same few (typically < 10) instruction
sequence.  If you do that regularly, you have other problems. 
> ...
> ALL that is needed to allow an implementation to support any 
> POD type as atomic is to allow it to add a lock flag to that 
> object, but that means allowing an atomic type to be of 
> different size to the unqualified type.  This is already 
> allowed in one section, except that the ability to lose 
> qualifiers makes it unimplementable!  Now, is that tradeoff 
> worth while?
> 
Leaving all the discussion of the current standard aside, it's unclear
to me why we want this, or even precisely what we want.

The "emulated_atomic" class template in
http://www.hpl.hp.com/personal/Hans_Boehm/c++mm/atomics.h.txt does
support this behavior.  But I have to admit that I only envisioned its
use in cases in which there was strong hope of hardware implementation
on most platforms, i.e. basically for 1, 2, 4, and 8 byte data.  And the
atomic operations are limited to load, store, CAS, etc.

What kind of atomic operations do you want to support?  Assume we have

atomic foo x;

where foo is a POD struct with lots of fields.  Clearly

x = bar(x);

in general can't behave atomically, since bar may involve other
operations, and I can get deadlocks on the "hidden" lock.  Not to
mention strange and mysterious performance issues, since the
implementation is not in any sense lock-free.

Furthermore, this is almost never the kind of atomicity I want anyway.
If I'm manipulating composite objects, I generally want atomic
operations that involve more than one object.  Nick's doubly linked list
example is a good one.  Even a linked stack can't be implemented this
way without additional cleverness.

In general, I think we're getting dangerously close here to building in
objects that implicitly acquire a lock on the object on all individual
accesses.  The original Java library did that.  I claim it was a mistake
that has now mostly been corrected.

In my view, single-object atomic operations are inherently a low-level
facility to be used by experts in constructing high performance
concurrent libraries, which can then be used by merely clever
programmers in constructing real concurrent applications.
Unfortunately, I think all of this matters greatly to the resulting
applications.  Otherwise we shouldn't bother.

Leaving issues of compatibility and standardese aside, I don't know how
to construct a useful facility that really does more than that. 

Hans