[cpp-threads] Re: C++ memory model - my comments

Nick Maclaren nmm1 at cus.cam.ac.uk
Wed Nov 2 16:32:56 GMT 2005


> I think that in general, we need to be careful to keep this manageable.
> And I'd really like to confine our work to the C++ standard for now,
> though clearly we have to do that in a way that makes it as easy as
> possible to say the right thing in any related standards.  I'm most
> worried about threads, in that I think the situation is currently worst
> there; it's currently hard to make solid guarantees about even the most
> basic multi-threaded programs.

Yes.  But I feel that it is critical (a) to ensure that the other
standards can be specified compatibly and (b) that other important C++
features are covered.

> In the short term, I'm less concerned about Posix signals.  We cannot
> really address them in the C++ standard, except possibly by cleaning up
> some of the groundwork.  I'm not opposed to that at all, especially
> since I don't particularly like the treatment in the current C++
> standard, which I gather has been significantly revised in C99 anyway.

C99 hasn't changed the handling of signals at all with respect to
C89.  The wording is different, but it can be summarised in both
standards as "You may THINK that this is a language feature, but it
isn't.  Go away."  Seriously :-(

My only real concern about POSIX signals is that it be possible to do
something sensible.  At present, in C99/POSIX, it isn't.

> I'm not sure what you mean by a local memory barrier.  You're not
> talking about signals here, presumably?  You want to make sure that if I
> write
>
> x = y/z; ... fetestexcept(...) ...
>
> I actually see the exception(s) generated by the division?

No.  C99 almost specifies that, but I am talking about the interaction
between memory and exceptions.  Consider the following codes:

    x = y/z;
    feclearexcept(...);
    < invoke another thread, with a memory barrier >
    p = q/r;
    if (fetestexcept(...)) ...

and (heaven help us):

    x = y/z;
    if (fetestexcept(...)) < invoke another thread, with a memory barrier >
    p = q/r;

[ Add atomic/volatile to taste, to confuse the issue ]

It is obviously essential that the exceptions don't get attached to the
wrong operations but, at the implementation level, floating-point
exceptions may be attached to either the operation or the store.  Inter
alia, getting this wrong means that completing floating-point operations
in software may require the locking of the whole application, and not
just the active thread.  I don't think that the IA64 architecture
exposes this, but would need to study it to be quite sure.

[ Important note:  SIGFPE is not a signal.  Please ask me to explain
if you disagree or don't understand my point. ]

> My main concern here is that I'm not sure any of the other participants
> (I'm including myself in the "other") here are sufficient experts on
> this issue.

That may well be true - it is an extremely arcane aspect of computing,
which is why I tried to stop C99 from standardising what it didn't
understand :-(

> >     b) There is a similar issue with access 'violations'
> > (things that cause SIGSEGV).  ...
>
> Is this really critical?  I do use SIGSEGV handling on Linux to track
> writes, but there seem to be issues here that go beyond both C++ and
> Posix.

Yes, it's critical, but it would be covered by the same wording needed
for floating-point.  If it is got wrong, no memory management system
(including garbage collectors) can use SIGSEGV, nor can application
TLB miss handlers, and any attempt to even INSPECT the IA64 RSE or ALAT
registers in application code (as needed for diagnostics etc.) will
go wrong.

[ Important note:  SIGSEGV is not a signal.  Please ask me to explain
if you disagree or don't understand my point. ]

> >     c) C, C++ and POSIX have a huge amount of hidden state
> > that can be set by library functions ...  Note that pure memory
> > synchronisations (as in POSIX) are not specified to
> > synchronise such state.

> I agree that's a mistake in most cases.  But I think that argues that
> the C++ spec should provide a hook for API standards to include other
> "kinds of memory".  I'm not sure it needs to have a big impact.

Right.  But that is all I am saying needs to be done for the previous
examples.

Also, anything that is in the C++ language should use that hook!

> > Example 2
> >
> > Thread A:  X = 1
> >            set flag object P with release
> >            set atomic object Z (no memory serialisation)
> > Thread C:  read atomic object Z (no memory serialisation)
> >            wait for flag object Q with acquire
> >            read X
>
> I'm not sure whether the second example came out wrong.  In my view, it
> clearly contains a race, since the accesses to Z can be reordered wrt
> the accesses to P and Q.

I think that is a serious mistake, unless there are both synchronising
and unsynchronised atomic operations, and assignment to and reading  
from volatile are synchronising.  Users WILL assume that :-(

> > There are a lot of minor issues here, such as what types
> > should be required and permitted to be updatable atomically.
> > Floating-point? Complex numbers?
>
> Doug Lea did point out on another list that compare-and-swap on floats
> has "weird" semantics, in that it uses bit-wise equality, not fp
> equality.  I think that in a C++ template context, we have to accept
> that, since it will never use user-defined == comparison.  And this
> doesn't make it useless.

Actually, what is REALLY wanted is Alpha-style LLC/STC.  Bitwise
comparison is closer to that than numeric is.

> > Similarly, I don't like addition being special.  Why are the
> > logical operations different?  Or negation?
>
> The current proposal provides add, add1, sub1, and, and or for atomic
> integers.  I admit the first three also make sense for fp.  And it might
> buy you something over the obvious compare-and-swap-based code on
> machines with LL/SC.  Thus I agree it should probably be there.
>
> We have to draw the line somewhere, though.  We can add atomic negation
> if someone has an application in mind.

Well, the minimal approach would be operations for solely a single
type of integer, though I would add exclusive or.  Actually, I would
NOT include floating-point in the required set.

Load and store often get forgotten - they are critical for both
integers and pointers.

> In general, if you want an arbitrary function, you're stuck with
> compare-and-swap.  That may mean you end up with a non-wait-free
> implementation, but we don't guarantee that anyway.  Usually the
> overhead of using compare-and-swap is fairly small

Yes.

> That seems problematic anyway, since I can dynamicaly allocate memory
> with malloc, and then use it to hold an atomic type.  Are there any such
> systems that support a thread API like pthreads?

I think that is a serious mistake.  While I don't know any systems
that do this sort of thing (coherence only in some areas) currently,
it is a technology that is likely to return.  Whatever the spec. is,
I feel that the implementation should know that it might be used in
that way at allocation time.

Note that it is there in modern interface specifications, like
InfiniBand, so in theory it could be exposed to the application.


Regards,
Nick Maclaren,
University of Cambridge Computing Service,
New Museums Site, Pembroke Street, Cambridge CB2 3QH, England.
Email:  nmm1 at cam.ac.uk
Tel.:  +44 1223 334761    Fax:  +44 1223 334679



More information about the cpp-threads mailing list