[cpp-threads] C++ memory model - my comments

Thu Oct 27 21:39:27 BST 2005

The following follows on from Hans Boehm's strawman proposal of a memory
model.  It doesn't exactly disagree with it, so much as try to explain
why it doesn't go far enough and to dissent on some details.  I refer
to POSIX quite a lot, but mainly to explain where it has got it wrong.

1. The Scope
------------

It is imperative that the model be integrated with the execution model
(mainly but not entirely sequence points), and not just be potentially
compatible with it.  Why?  I shall give some examples of what must be
integrated - sorry, but 'must' is correct.

    a) Floating-point exception flags (IEEE and similar) are often set
when a value is stored - especially on architectures (like x86) which
have more general register than store formats.  But they are even more
often set when the operation is performed.  To handle both of these
cases even half sanely, the standard must not allow a conforming program
to distinguish sequence points from (local) memory barriers.

    b) There is a similar issue with access 'violations' (things that
cause SIGSEGV).  I won't go into details, but anyone who remembers the
difference between 0C4 and 0C5 on a System/370 or has used comparable
PC/embedded systems with 'remote' memory will know what I mean.  Most
modern operating systems and compilers do not allow the handling of
SIGSEGV, but older ones did and embedded ones may still do.

    c) C, C++ and POSIX have a huge amount of hidden state that can be
set by library functions (from the FP modes to the locale to the file
descriptors), some of which can affect memory accesses, and some of
which is global.  Any potential to get this out of step over a
synchronisation point is at best a recipe for chaos.  Note that pure
memory synchronisations (as in POSIX) are not specified to synchronise
such state.

    d) I/O, signals, messages, scheduling and so on are other forms of
communication.  POSIX specifies that file operations are atomic with
respect to other threads, but they are not synchronisable by pthreads
operations (not being memory operations).  Even 'time warps' are
allowed!  That sort of semantics is a recipe for chaos.

And now for one example of what must NOT be integrated!

    e) If an operation or memory access causes the hardware or kernel to
send a signal or message to another thread, it is very unlikely that the
invoking thread will be given an opportunity to synchronise memory.  This
should be left as unspecified.

2. Serialisation Events
-----------------------

The POSIX rules bear little relationship to reality, either as far as
what implementations do or as far as what programs assume.  It is
completely unreasonable to assume that only some pthreads calls cause
synchronisation (see 1.d above).  There are at least the following
types of serialisation event:

    1) Explicit serialisation with a barrier attribute.
    2) Explicit serialisation without any barrier attribute (e.g.
one of the options for atomic operations).
    3) External serialisation (usually causal).  A good example is
thread A writing on a FIFO and thread B reading it.
    4) Implicit causal serialisation (as in Hans's paper).
    5) Temporal serialisation (the mere passage of time).

POSIX says that only (1) matters, but almost every programmer will
assume at least (1) and (3).  (2) is murky but at least affects only
programs that use it explicitly.  Allowing (4) and (5) would severely
impact optimisation.

This may seem irrelevant, but consider the case of process A with
threads Aa and Ab that has duplex FIFO open to process B (a common
scenario).  No normal programmer will assume that Aa can write to B and
Ab get the response, and yet Ab not see memory updates performed by Aa
before the write to the FIFO.  Yet that is what POSIX allows.

My belief is that it is essential to make at least the main process
control and I/O operations into synchronisation points, but I am not at
all sure exactly how to word this and exactly what to include.  And
should signalling be included this?  All this needs thought.

3. Topologies
-------------

This is a nice one - if you are a pure mathematician :-)

Consider:

Example 1

Thread A:  X = 1
           set flag object P with release
Thread B:  wait for flag object P with acquire
           set flag object Q with release
Thread C:  wait for flag object Q with acquire
           read X

Example 2

Thread A:  X = 1
           set flag object P with release
           set atomic object Z (no memory serialisation)
Thread C:  read atomic object Z (no memory serialisation)
           wait for flag object Q with acquire
           read X

The question is whether neither of these is correct, only example 1 is
or both are.  I call these pairwise, transitive and global topologies.
There are, of course, all of the variations due to the different types
of serialisation event described in 2 above, but let's ignore that
complication.

While there are good implementation reasons to say that neither is, that
would NOT be expected by most programmers.  Equally, the current POSIX
description implies that both are allowed, but does not say so
explicitly, and it is a performance nightmare on large SMP systems; I
suspect strongly that it is not what they actually do.  Which should C++
choose?

4. Atomic Operations and Types
------------------------------

There are a lot of minor issues here, such as what types should be
required and permitted to be updatable atomically.  Floating-point?
Complex numbers?

Similarly, I don't like addition being special.  Why are the logical
operations different?  Or negation?

It is also clearly essential to state that any object updated
non-atomically must obey the normal rules as far as interactions
with atomic updates in that or other threads go.  That could be messy
to specify.

Lastly, there are a lot of advantages to saying that atomic operations
should be allowed only on atomic types, which must be defined as such.
Systems without cache coherence (and even ones with, using some hardware
features) need to put atomic variables in uncached memory.

Regards,
Nick Maclaren,
University of Cambridge Computing Service,
New Museums Site, Pembroke Street, Cambridge CB2 3QH, England.
Email:  nmm1 at cam.ac.uk
Tel.:  +44 1223 334761    Fax:  +44 1223 334679