[cpp-threads] Comments on Hans's latest strawman proposal

Fri Jan 13 20:01:19 GMT 2006

I don't think that followups need the CCs.  I am including the UK C++
panel, because some people may like to see the second half.

> The occurs-before relation
> 
> If a memory update or side-effect a occurs before another memory operation
> or side-effect b, then informally a must appear to be completely evaluated
> before b in the sequential execution of a single thread, e.g. all side
> effects of a must occur before those of b. This notion does not directly
> imply anything about the order in which memory updates become visible to
> other threads.

This is incomplete: "e.g. all side effects of a must occur before
those of b" should be replaced by "e.g. all accesses and all side
effects of a must occur before those of b".

> Wherever the current standard states that there is a sequence point between
> A and B, we instead propose to state that A occurs-before B. This will
> constitute the precise definition of occurs-before on subexpressions, and
> hence on memory actions and side effects.

And punt on exactly when sequence points occur.  Agreed :-(

> The depends-on relation
> 
> Consider a given execution of a particular thread, i.e. the sequence of
> actions that may be performed by a particular thread as part of the
> execution of the containing program. If, as a result of changing only the
> value read by an atomic load L, a subsequent atomic store S either can no
> longer occur, or must store a different value, then S depends on L.

{A}: Do you mean to restrict this to L-then-S and rely on the
occurs-before relation for L-then-S?  If so, see later for some changes
needed.

> The synchronizes-with relation
> 
> Informally, a lock release synchronizes-with the next acquisition of the
> same lock. A barrier (in the pthread_barrier_wait sense or OpenMP sense)
> synchronizes-with all corresponding executions of the barrier in other
> threads. A memory fence (or barrier in the other sense) synchronizes-with
> the next execution of the barrier, usually by another thread. An atomic
> store synchronizes-with all atomic loads that read the value saved by the
> store.

Be careful to ensure that a lock acquisition synchronizes-with a
consequential failure to acquire the lock because it is in use (but not
for any other reason).  POSIX has got this one wrong.

> The happens-before relation
> 
>    * If A and B are ordinary (not atomic!) memory references, and A
>      occurs-before B, then A happens-before B.

{A}: You can't mean this!  The "(not atomic!)" needs to be scrapped.
Without doing so, there is nothing in this draft that provides any
happens-before relation between atomic and ordinary references.

>    * I an atomic store B, depends-on an earlier atomic load A in the same
>      thread, then A happens before B. In particular, assuming all
>      assignments are atomic operations, there is no happens-before ordering
>      between the load and the store in r1 = x; y = 1, but there is an
>      ordering between such an ordering between the two halves of r1 = x; y =
>      r1.

{A}: See above about the wording of depends-on.  This is S-then-L on r1,
not L-then-S.

> Consistent executions
> 
>   1. The actions of any particular thread (excluding values read from
>      potentially shared locations), and the corresponding occurs-before
>      relation, are consistent with the normal sequential semantics as given
>      in the rest of the standard.

Gug.  That will offer plenty of scope for perverse interpretations.

>   2. The synchronizes-with relation is consistent with the constraints
>      imposed by the definitions of the synchronization primitives. For
>      example, if S is an atomic store which synchronizes-with an atomic load
>      L, then the loaded and stored values must be the same.

I don't understand what you mean by this, and whether it is constraining
the program, implementation or what.

>   3. (intra-thread visibility) If a load L sees a store S from the same
>      thread, then L must not occur-before S, and there must be no
>      intervening store S' such that S occurs-before S' and S' occurs-before
>      L.

Don't you mean "If a load L sees a store S from the same thread, then S
must occur-before L"?

> Data races
> 
> We define a memory location to be a variable, (non-bitfield) data member,
> array element, or contiguous sequence of bitfields. We define two actions to
> be conflicting if they access the same memory location, and one of them is a
> store access.

Thus bringing in my Objects diatribe.  While that is a semi-orthogonal
aspect, it does need attention.

> [ Bill Pugh points out that that the notion of "input" here isn't well
> defined for a program that interacts with its environment. And we don't want
> to give undefined semantics to a program just because there is some other
> sequence of interactions with the environment that results in a data race.
> We probably want something more along the lines of stating that every
> program behavior either
> 
>   1. corresponds to a consistent execution in which loads see stores that
>      happen-before them, or
>   2. there is a consistent execution triple with a data race, such that
>      calls to library IO functions before the data race are consistent with
>      observed behavior.

Yes.  I think that it can be done without too much pain.  See at the end.

> This gets us a bit closer to the Java causality model. But I'm not sure we
> need much of its complexity. I think the notion of "before" in the second
> clause is easily definable, since we can insist that IO operations be
> included in the effectively total order of ordinary variable accesses. ]

It would be extremely harmful to be so restrictive, and would cripple
performance.  POSIX is bad enough, but this is an order of magnitude
worse.  Please ask me for an expansion of the issue if this is unclear.

> Consequences
> 
>    * Programs using no synchronization operations other than simple locks,
>      and which allow no data races, behave sequentially consistently.

Yes, that is probably so.

>      [ This still needs a proof, but I basically believe it. For each
>      consistent execution triple, we have a total order on the ordinary
>      operations, and a total order on the locking operations for each lock,
>      each of which must be consistent with intra-thread ordering of actions.
>      We can construct a single total order of all the actions that is
>      consistent with all of them. Since there is a happens-before relation
>      between any load and the corresponding store, no matter which such
>      ordering we pick, every load in the resulting ordering will see the
>      preceding store in the total order.

Oh, no, we don't - and, oh, no, we can't!  I am afraid that explanation
is completely flawed.  What we have is a partial order where every total
order consistent with the partial order is equivalent, but that is a
long way from there being a total order.

> Here we list some more detailed implications of the last statement:
> 
>    * Structure or class data member assignments may not be implemented in a
>      way that overwrites data members not assigned to in the source, unless
>      the assigned and overwritten members are adjacent bit-fields.

Gug.  How clear is C++ that padding bytes may not be written to?  There
was a long and inconclusive debate about whether C allows them to be.
This may need clarification.

There is a MUCH nastier problem with constructions like the following
(again, using C syntax):

static double d;
char *p = &((char *)d)[1];
*(char *)d = 0;

> Volatile variables and data members
> 
> Since the value of a __async volatile can affect control flow and thus
> determine whether or not other shared variables are accessed, this implies
> that ordinary memory operations cannot in general be reordered with respect
> to a __async volatile access. It also means that __async volatile accesses
> are sequentially consistent.

{A}: see above.  I agree that this is desirable, just not that the current
wording states it.

> [ This section appears very controversial, and may be withdrawn. The
> alternative is to always rely on the atomic operations library. It seems to
> make sense to put this on hold until we have a better handle on the atomic
> operations library, so that we can tell whether that would be a major
> inconvenience.

I don't see that it helps.  The real issue is the interaction between
atomic and ordinary accesses, and not the minor one of the syntax used
for the former.

> [We can't talk about async-signal-safety here. We might suggest that __async
> volatile int and __async volatile pointers be async-signal-safe where that's
> possible and meaningful. My concern here is with uniprocessor embedded
> platforms, which might have to use restartable critical sections to
> implement atomicity, and might misalign things. }

I really don't understand this.  Yes, I know that people nowadays rarely
don't understand asynchronous signals, but there AREN'T any extra
problems as far as simple memory updates go!  If the implementation has
to use critical sections to implement atomicity, it can use exactly the
same logic to implement async-signal-safety.  Been there - done that.

Here is a draft of what I feel could (and should) be added for the
non-memory aspects.  I have tried to specify things in such a way that
they map to the equivalent memory concepts, so that it will add few
extra problems and will adapt to changes in the main memory model.  Its
terminology is all wrong, because of my lack of knowledge of how to use
C++, but I am concentrating on the concepts.

Intra-Process Non-Memory Actions (Including Hidden State, Exceptions etc.)
--------------------------------------------------------------------------

Unless qualified in subsequent paragraphs, all non-memory actions have
occurs-before constraints and relations as if they were memory actions
of a similar class.  Examples:

    set_terminate behaves as if it updates a hidden memory location, and
    a call to that handler behaves as if it reads that location

    setlocale behaves as if it updates a hidden memory location, and a
    use of the default locale behaves as if it reads that location

    set_new_handler behaves as if it updates a hidden memory
    location, and a call to that handler behaves as if it reads that
    location

    throw behaves as if it updates a hidden memory location, and
    catching that exception behaves as if it reads that location

Where it is not otherwise specified, such an action or access behaves as
if it is an access to a shared object solely within the calling thread,
and there is no synchronizes-with action implied by the action.  The
normal rules apply when deciding whether a program execution is
consistent.

C++ does not currently define any intra-process out-of-band exceptions
(in the normal computer science sense, not necessarily restricted to C++
exceptions).  However, it permits some optional exceptions to be thrown
out-of-band in some implementations (especially arithmetic ones).  Where
an implementation supports the catching of such exceptions, it should
attempt to do the following for intra-thread out-of-band exceptions and
shall document whether it does so.

[ This is not exactly an onerous requirement, but is the minimum that is
useful and the maximum that can always be implemented.  I give two
forms, either of which could be used; the former uses an explicit
synchronization function and the latter hijacks the inter-thread ones. ]

EITHER:

If a thread executes two synchronizes-locally actions A and B, an action
C throws a intra-thread out-of-band exception at 'time' D, A occurs
before C and C occurs before B, then A shall occur before D and D shall
occur before B.  As far as this constraint is concerned, there shall be
a synchronizes-locally action immediately following program
initialization and one immediately preceding normal program termination
(but not necessarily that caused by calls to the abort or terminate
functions).  The synchronizes-locally action shall be caused by a call
to the barrier function in class exception.

OR:

If a thread executes either side of two synchronizes-with actions A and
B with any other thread, an action C throws a intra-thread out-of-band
exception at 'time' D, A occurs before C and C occurs before B, then A
shall occur before D and D shall occur before B.  As far as this
constraint is concerned, there shall be an anonymous synchronizes-with
action immediately following program initialization and one immediately
preceding normal program termination (but not necessarily that caused by
calls to the abort or terminate functions).

[END OF ALTERNATIVE]

[ PROBLEM: God help me, POSIX permits things like SIGFPE to be raised in
a thread other than the one where the arithmetic exception occurred.
That being so, there will be considerable political pressure to permit
C++ exception handling to permit that, because otherwise there will be
systems on which C++ cannot implement arithmetic exception handling in
conjunction with threads.  The following should do the trick. ]

Where an exception is caused by an action in one thread but thrown in
another, it shall be treated as as if it occurs in two stages, one in
each thread: the action shall cause a hidden memory location to be
updated, and the throwing of the exception shall result from an
asynchronous read of that location.  Both stages shall have ordering
semantics as described above, with the resulting constraints on
consistent execution.  An implementation shall document when this may
occur.

RECOMMENDATION: an implementation should avoid throwing an exception in
a different thread from the action that causes it, where possible.

RECOMMENDATION: an implementation should attempt to follow these
guidelines for other forms of handling of intra-process out-of-band
exceptions (in the general sense), such as the use of POSIX signal
handling for signals that are both generated and trapped within the
process.

Actions on streams and other I/O are described separately.

External Non-Memory Actions (Including Streams and Other I/O)
-------------------------------------------------------------

Streams consist of a intra-process component and an optional external
component.  All actions on streams that affect only the intra-process
component have occurs-before constraints and relations as if they were
memory actions on the stream state.  Any exceptions thrown as a result
of them will be raised synchronously in the thread that performs the
action.

[ This is always possible, because the cases that can't be done that way
can be called external.  We could allow the ghastly POSIX signal mess,
but I can't see the merit in being perverse for the sake of it. ]

Actions that affect the external component may have failures that occur
out of band, and this may cause an action that apparently succeeded to
produce subsequent undefined behaviour, possibly when another thread
accesses the stream.  An implementation should attempt to diagnose such
failures not later than when the failure of the action would become
visible by causing another action to behave inconsistently.

RECOMMENDATION: an implementation should avoid throwing an exception in
a different thread from the action that causes it, where possible.  If
it is necessary to throw an asynchronous exception, there should be a
mechanism by which a particular thread can register that it should
receive such exceptions (either all such ones, or those raised regarding
the use of a particular stream).

[ Again, these are not exactly onerous requirements, but are the minimum
that is useful and the maximum that can always be implemented. ]

[ There needs to be a clear statement of which actions must be solely
intra-process and which may be external.  I don't know C++ well enough
if the distinction could be applied to the classes (i.e. with iostream,
fstream and cstdio being external), or whether it would have to be
applied to any streams derived from streams in those classes. ]

Where an action in thread A changes the state of a file and the changed
state is then seen by a separate thread B, A synchronizes-with B.

RECOMMENDATION: Where an action in thread A changes the state of a
standard stream or file, that changed state is used by an external agent
to change the state of that or another standard stream or file, and that
changed state is then seen by a separate thread B, A synchronizes-with
B.

[ This could be made a requirement rather than a recommendation, by
adding "The existence and behaviour of any such external agents is
implementation dependent." ]

RECOMMENDATION: C++ does not currently define any external message
passing (including signalling), but an implementation or library should
treat any message passing facilities that are received by an explicit
action in the receiving thread (such as POSIX signals received by
polling, socket operations or MPI blocking communication) as if they
were another form of I/O.

RECOMMENDATION: There are serious consistency problems with facilities
that permit one thread to cause an action to take place implicitly
(often called asynchronously) in another thread (such as POSIX signals
or MPI non-blocking communication), and the specification of the
semantics of such facilities is a future direction.

[ Translation: don't go there until given the all-clear. ]

Regards,
Nick Maclaren,
University of Cambridge Computing Service,
New Museums Site, Pembroke Street, Cambridge CB2 3QH, England.
Email:  nmm1 at cam.ac.uk
Tel.:  +44 1223 334761    Fax:  +44 1223 334679