[cpp-threads] RE: volatile, memory models, threads
Boehm, Hans
hans.boehm at hp.com
Wed Mar 1 18:49:43 GMT 2006
> -----Original Message-----
> From: Nick Maclaren
> > On Alpha, those don't ensure ordering, which I think was
> the issue here.
> > But Alpha has always had an MB (memory barrier) instruction
> to do so.
> > Lock implementations need to use both. To simply ensure memory
> > ordering, MB suffices.
>
> For atomic objects, it needs at least one (perhaps two) per
> access, and can handle only a single 8-byte object (in later
> Alphas, 1, 2, 4 and 8-byte).
Let's separate atomicity and ordering here. I believe on Alpha an
8-byte write is always atomic, i.e. you can never see it half written.
An 8 byte atomic load with acquire semantics (effectively a Java
volatile read, or a C++/IA64 volatile read) takes a load followed by an
MB instruction. An atomic counter increment takes an LL/SC pair, but no
MB instructions, if you don't care about ordering.
> ...
> > > That doesn't work, I am afraid. Random users deadlocking the
> > > system, if you are lucky.
> >
> > I oversimplified a bit. You use restartable critical
> sections, i.e.
> > you let the user process specify that if you are preempted
> at certain
> > locations, you should be restarted at a slightly different
> instruction.
>
> Sorry, but that only nibbles at the edge of the issue. It
> doesn't help in general. Remember that the wrong scheduling
> strategy can cause a correct program to slow down by a factor
> exponential in the number of processors.
We're talking about uniprocessors here. Exponential in the number of
processors is fine :-).
More seriously, this is a fairly well-known technique, both for
uniprocessors, and for accessing per-processor state on multiprocessors.
Instructions are executed more than twice only if you reschedule the
same processor twice within the same few (typically < 10) instruction
sequence. If you do that regularly, you have other problems.
> ...
> ALL that is needed to allow an implementation to support any
> POD type as atomic is to allow it to add a lock flag to that
> object, but that means allowing an atomic type to be of
> different size to the unqualified type. This is already
> allowed in one section, except that the ability to lose
> qualifiers makes it unimplementable! Now, is that tradeoff
> worth while?
>
Leaving all the discussion of the current standard aside, it's unclear
to me why we want this, or even precisely what we want.
The "emulated_atomic" class template in
http://www.hpl.hp.com/personal/Hans_Boehm/c++mm/atomics.h.txt does
support this behavior. But I have to admit that I only envisioned its
use in cases in which there was strong hope of hardware implementation
on most platforms, i.e. basically for 1, 2, 4, and 8 byte data. And the
atomic operations are limited to load, store, CAS, etc.
What kind of atomic operations do you want to support? Assume we have
atomic foo x;
where foo is a POD struct with lots of fields. Clearly
x = bar(x);
in general can't behave atomically, since bar may involve other
operations, and I can get deadlocks on the "hidden" lock. Not to
mention strange and mysterious performance issues, since the
implementation is not in any sense lock-free.
Furthermore, this is almost never the kind of atomicity I want anyway.
If I'm manipulating composite objects, I generally want atomic
operations that involve more than one object. Nick's doubly linked list
example is a good one. Even a linked stack can't be implemented this
way without additional cleverness.
In general, I think we're getting dangerously close here to building in
objects that implicitly acquire a lock on the object on all individual
accesses. The original Java library did that. I claim it was a mistake
that has now mostly been corrected.
In my view, single-object atomic operations are inherently a low-level
facility to be used by experts in constructing high performance
concurrent libraries, which can then be used by merely clever
programmers in constructing real concurrent applications.
Unfortunately, I think all of this matters greatly to the resulting
applications. Otherwise we shouldn't bother.
Leaving issues of compatibility and standardese aside, I don't know how
to construct a useful facility that really does more than that.
Hans
More information about the cpp-threads
mailing list