[cpp-threads] RE: volatile, memory models, threads

Wed Mar 1 20:43:13 GMT 2006

"Boehm, Hans" <hans.boehm at hp.com> wrote:
> 
> Let's separate atomicity and ordering here.  I believe on Alpha an
> 8-byte write is always atomic, i.e. you can never see it half written.
> An 8 byte atomic load with acquire semantics (effectively a Java
> volatile read, or a C++/IA64 volatile read) takes a load followed by an
> MB instruction.  An atomic counter increment takes an LL/SC pair, but no
> MB instructions, if you don't care about ordering.

If we leave niggles of detail aside, yes.

> We're talking about uniprocessors here.  Exponential in the number of
> processors is fine :-).

!!!!!

> More seriously, this is a fairly well-known technique, both for
> uniprocessors, and for accessing per-processor state on multiprocessors.
> Instructions are executed more than twice only if you reschedule the
> same processor twice within the same few (typically < 10) instruction
> sequence.  If you do that regularly, you have other problems. 

Eh?  There are several variations, but the forms that have a
deterministic bound on the loop count tend to be very restrictive
and expensive compared to the ones that don't.  The simple CAS or
LL/STC loops have a negative exponential distribution, with the
parameter going up asymptotically with the load.  And, for gang
scheduled threads, that can be BAD news.

Alternatively, if you use mechanisms optimised for gang scheduling,
they tend to run very badly for random interactions.  I don't know
of any simple, portable techniques that are robust against both
access patterns and loading rates.

Lastly, you typically have two or more schedulers involved, each
with different (undocumented) characteristics, and have ghastly
possibilities for interaction.  For example, memory access at the
hardware level is controlled by quite complex logic, and the
operating system scheduling by a different set.  How is the poor
library developer to write robust, efficient code for a SINGLE
combination, let alone add portability to the mix?

> Leaving all the discussion of the current standard aside, it's unclear
> to me why we want this, or even precisely what we want.

That is, indeed, true - and is at the core of my point.  I am saying
that the decision on what we want should be taken first.

> Furthermore, this is almost never the kind of atomicity I want anyway.
> In general, I think we're getting dangerously close here to building in
> objects that implicitly acquire a lock on the object on all individual
> accesses.  The original Java library did that.  I claim it was a mistake
> that has now mostly been corrected.

I am not sure that it is necessarily a mistake, but I do agree that
such accesses are not lightweight.

> In my view, single-object atomic operations are inherently a low-level
> facility to be used by experts in constructing high performance
> concurrent libraries, which can then be used by merely clever
> programmers in constructing real concurrent applications.

What I am not sure is how useful they are for that, given some of
the other points.  Choosing a set that is efficient on only some 
system designs isn't really a good thing for a standard.

> Leaving issues of compatibility and standardese aside, I don't know how
> to construct a useful facility that really does more than that. 

I am not sure how to do even THAT, portably.

Regards,
Nick Maclaren,
University of Cambridge Computing Service,
New Museums Site, Pembroke Street, Cambridge CB2 3QH, England.
Email:  nmm1 at cam.ac.uk
Tel.:  +44 1223 334761    Fax:  +44 1223 334679