[cpp-threads] RE: C++ memory model - my comments

Tue Nov 1 23:05:00 GMT 2005

Nick -

I think that in general, we need to be careful to keep this manageable.
And I'd really like to confine our work to the C++ standard for now,
though clearly we have to do that in a way that makes it as easy as
possible to say the right thing in any related standards.  I'm most
worried about threads, in that I think the situation is currently worst
there; it's currently hard to make solid guarantees about even the most
basic multi-threaded programs.

In the short term, I'm less concerned about Posix signals.  We cannot
really address them in the C++ standard, except possibly by cleaning up
some of the groundwork.  I'm not opposed to that at all, especially
since I don't particularly like the treatment in the current C++
standard, which I gather has been significantly revised in C99 anyway.

Some more detailed comments/questions are below, mostly where I don't
understand things.

>     a) Floating-point exception flags (IEEE and similar) are 
> often set when a value is stored - especially on 
> architectures (like x86) which have more general register 
> than store formats.  But they are even more often set when 
> the operation is performed.  To handle both of these cases 
> even half sanely, the standard must not allow a conforming 
> program to distinguish sequence points from (local) memory barriers.
I'm not sure what you mean by a local memory barrier.  You're not
talking about signals here, presumably?  You want to make sure that if I
write

x = y/z; ... fetestexcept(...) ...

I actually see the exception(s) generated by the division?

My main concern here is that I'm not sure any of the other participants
(I'm including myself in the "other") here are sufficient experts on
this issue.
> 
>     b) There is a similar issue with access 'violations' 
> (things that cause SIGSEGV).  I won't go into details, but 
> anyone who remembers the difference between 0C4 and 0C5 on a 
> System/370 or has used comparable PC/embedded systems with 
> 'remote' memory will know what I mean.  Most modern operating 
> systems and compilers do not allow the handling of SIGSEGV, 
> but older ones did and embedded ones may still do.
Is this really critical?  I do use SIGSEGV handling on Linux to track
writes, but there seem to be issues here that go beyond both C++ and
Posix.
> 
>     c) C, C++ and POSIX have a huge amount of hidden state 
> that can be set by library functions (from the FP modes to 
> the locale to the file descriptors), some of which can affect 
> memory accesses, and some of which is global.  Any potential 
> to get this out of step over a synchronisation point is at 
> best a recipe for chaos.  Note that pure memory 
> synchronisations (as in POSIX) are not specified to 
> synchronise such state.
I agree that's a mistake in most cases.  But I think that argues that
the C++ spec should provide a hook for API standards to include other
"kinds of memory".  I'm not sure it needs to have a big impact.
> 
>     d) I/O, signals, messages, scheduling and so on are other 
> forms of communication.  POSIX specifies that file operations 
> are atomic with respect to other threads, but they are not 
> synchronisable by pthreads operations (not being memory 
> operations).  Even 'time warps' are allowed!  That sort of 
> semantics is a recipe for chaos.
I personally agree that this is wrong.  (Or perhaps just ambiguous, I
haven't studied the standard enough.)  But it seems to be a pure Posix
issue.  We clearly intend to provide terminology for library APIs to
properly define their synchronization operations.
> 
> And now for one example of what must NOT be integrated!
> 
>     e) If an operation or memory access causes the hardware 
> or kernel to send a signal or message to another thread, it 
> is very unlikely that the invoking thread will be given an 
> opportunity to synchronise memory.  This should be left as 
> unspecified.
Yes.  Agreed.
> 
> ...
> 
> 3. Topologies
> -------------
> 
> This is a nice one - if you are a pure mathematician :-)
> 
> Consider:
> 
> Example 1
> 
> Thread A:  X = 1
>            set flag object P with release
> Thread B:  wait for flag object P with acquire
>            set flag object Q with release
> Thread C:  wait for flag object Q with acquire
>            read X
> 
> Example 2
> 
> Thread A:  X = 1
>            set flag object P with release
>            set atomic object Z (no memory serialisation)
> Thread C:  read atomic object Z (no memory serialisation)
>            wait for flag object Q with acquire
>            read X
> 
> The question is whether neither of these is correct, only 
> example 1 is or both are.  I call these pairwise, transitive 
> and global topologies. There are, of course, all of the 
> variations due to the different types of serialisation event 
> described in 2 above, but let's ignore that complication.
> 
> While there are good implementation reasons to say that 
> neither is, that would NOT be expected by most programmers.  
> Equally, the current POSIX description implies that both are 
> allowed, but does not say so explicitly, and it is a 
> performance nightmare on large SMP systems; I suspect 
> strongly that it is not what they actually do.  Which should 
> C++ choose?
> 
If we guarantee at least "happens-before consistency" for such atomic
operations, as is certainly the intent, then example 1 does not involve
a race, and thread C is guaranteed to see 1.   That's also true for Java
volatiles.

I'm not sure whether the second example came out wrong.  In my view, it
clearly contains a race, since the accesses to Z can be reordered wrt
the accesses to P and Q.

> 
> 4. Atomic Operations and Types
> ------------------------------
> 
> There are a lot of minor issues here, such as what types 
> should be required and permitted to be updatable atomically.  
> Floating-point? Complex numbers?
Doug Lea did point out on another list that compare-and-swap on floats
has "weird" semantics, in that it uses bit-wise equality, not fp
equality.  I think that in a C++ template context, we have to accept
that, since it will never use user-defined == comparison.  And this
doesn't make it useless.
> 
> Similarly, I don't like addition being special.  Why are the 
> logical operations different?  Or negation?
The current proposal provides add, add1, sub1, and, and or for atomic
integers.  I admit the first three also make sense for fp.  And it might
buy you something over the obvious compare-and-swap-based code on
machines with LL/SC.  Thus I agree it should probably be there.

We have to draw the line somewhere, though.  We can add atomic negation
if someone has an application in mind.

In general, if you want an arbitrary function, you're stuck with
compare-and-swap.  That may mean you end up with a non-wait-free
implementation, but we don't guarantee that anyway.  Usually the
overhead of using compare-and-swap is fairly small

> 
> It is also clearly essential to state that any object updated 
> non-atomically must obey the normal rules as far as 
> interactions with atomic updates in that or other threads go. 
>  That could be messy to specify.
Yes.
> 
> Lastly, there are a lot of advantages to saying that atomic 
> operations should be allowed only on atomic types, which must 
> be defined as such. Systems without cache coherence (and even 
> ones with, using some hardware
> features) need to put atomic variables in uncached memory.
That seems problematic anyway, since I can dynamicaly allocate memory
with malloc, and then use it to hold an atomic type.  Are there any such
systems that support a thread API like pthreads?

I do agree that whether or not something behaves atomically should be
determined by its declaration, at least in the normal case.

Hans