[cpp-threads] RE: volatile, memory models, threads

Nick Maclaren nmm1 at cus.cam.ac.uk
Wed Mar 1 09:26:04 GMT 2006


I have separated my specific points off at the end, so that my response
can be snipped if just that is relevant.

"Boehm, Hans" <hans.boehm at hp.com> wrote:
>
> > I feel that a decent language specification (C++ in this 
> > case) should not make difficulties for implementations 
> > wanting to do better.  But that decision has consequences.
> 
> I don't think we know how to do better.  We could try to provide general
> support for software transactions.  But I think that's an open research
> problem.  And the performance you could get for a general implementation
> on existing hardware probably would not make this a popular alternative.

No, we DO know how to do better, in this context.  I agree that it is
not MUCH better, but we have the choice of forbidding an
implementation from doing better or permitting it.  See at the end.

> And there's probably nothing beyond ugly syntax to prevent it from being
> implemented in a higher layer.

Absolutely NOT!  That is my point.  By taking certain decisions, we are
preventing certain implementation and usage strategies.

> In my mind all we can hope to do is provide access to low-level hardware
> primitives, and lay the groundwork for higher level libraries that
> provide standard implementations of lock-free data structures.

Don't forget that there isn't a dichotomy, but a gradation from the
'truest' of lock-free up to the most exposed and heavyweight of locks.
Most of the most successful approaches have been intermediate.

> I personally think there are serious limits as to how many low level
> differences you can hide in the low level interface.  ...

Agreed, but that is not what I am talking about.  I am referring to
what the specification should permit.

> On Alpha, those don't ensure ordering, which I think was the issue here.
> But Alpha has always had an MB (memory barrier) instruction to do so.
> Lock implementations need to use both.  To simply ensure memory
> ordering, MB suffices.

For atomic objects, it needs at least one (perhaps two) per access,
and can handle only a single 8-byte object (in later Alphas, 1, 2, 4
and 8-byte).

> > I really DO mean that it is common for an unprivileged 
> > parallel application to REQUIRE a system to be configured to 
> > meet its assumptions.  Been there - done that :-(
> 
> I can certainly believe that, but I also think this is unacceptable in
> the future, especially if we are aiming for mass market acceptance of
> multicore processors.

Therefore it is important that the specification doesn't require such
a thing, by indirect constraints on the implementation.  Far too many
standards get that one wrong - POSIX does, for example.

> I agree that we cannot require processors to run in strange modes, or
> the like.  But I don't see where we're doing that.

Grrk.  I can, but I agree that what you have said so far is where the
current consensus is taking us - I can't say the same about POSIX and
other threading models.  However, what we specify today will impact
the designs of the future.  Remember C++ and linkers, back in the 1980s?

> > Now consider thread A atomically updating bytes N:M and 
> > thread B doing so to P:Q, where N:M and P:Q overlap.  Should 
> > that be forbidden?
> 
> To first order approximation, yes, unless N==P and M==Q.  There is an
> issue with "double wide" compare-and-swap operations.

That is going to need a LOT of wording.  See at the end.

> > That doesn't work, I am afraid.  Random users deadlocking the 
> > system, if you are lucky.
> 
> I oversimplified a bit.  You use restartable critical sections, i.e. you
> let the user process specify that if you are preempted at certain
> locations, you should be restarted at a slightly different instruction.

Sorry, but that only nibbles at the edge of the issue.  It doesn't help
in general.  Remember that the wrong scheduling strategy can cause a
correct program to slow down by a factor exponential in the number of
processors.

> > Dare I mention the Alpha in this context? :-)
> 
> Sure.  But Alpha fixed this close to ten years ago, I believe, ...

That's what I meant :-)

> > All I am arguing is that we need to agree 
> > on what the aim of the atomic model should be before 
> > designing a specification.
> 
> Agreed.  But to be honest, I'm not sure what we could do that goes
> significantly beyond the current proposal in scope.  My intuition is
> that anything much more ambitious would lead to unacceptable performance
> in some simple cases.  I'd be happy to be proved wrong.

OK.  So here goes.


If the atomic qualifier is permitted to be lost by casting, then the
atomic objects have to be the same size as their non-atomic equivalents,
and an implementation does not have the option of putting a lock there
if it can't solve the problem another way.  This means that atomic
types have to be limited to ones that can be handled by the hardware.

Worse, that also applies to operations.  Many architectures do not
have an increment operation, which is therefore implemented by
multiple ones.  Some, like the Alpha, have "memory-free", cheap,
locking; others do not.

ALL that is needed to allow an implementation to support any POD type
as atomic is to allow it to add a lock flag to that object, but that
means allowing an atomic type to be of different size to the
unqualified type.  This is already allowed in one section, except
that the ability to lose qualifiers makes it unimplementable!  Now,
is that tradeoff worth while?

A second point is that, whether or not this is done, work will be
needed on deciding what is permitted on aggregates and subobjects of
basic atomic types.  This is a major can of worms.  Now, while
forbidding the atomic qualifier to be losable adds a little
complexity, it removes a HUGE amount by automatically forbidding all
of the nasty, messy byte access possible in the library and
elsewhere.

So my assertion that allowing atomic types to be of a different size
from their non-atomic equivalents looks harder to define and
implement, it may in fact be both more flexible and easier.


Regards,
Nick Maclaren,
University of Cambridge Computing Service,
New Museums Site, Pembroke Street, Cambridge CB2 3QH, England.
Email:  nmm1 at cam.ac.uk
Tel.:  +44 1223 334761    Fax:  +44 1223 334679



More information about the cpp-threads mailing list