[cpp-threads] RE: Comments on Hans's latest strawman proposal

Thu Jan 19 01:37:48 GMT 2006

[Apologies that this is overly long.  I suggest that further responses
break this up into separate threads.  In general, I'd appreciate small
patches to the strawman proposal, so that we can discuss/incorporate
them, especially once I finish the current revision pass.]

Nick-

Thanks for the detailed response.

> -----Original Message-----
> From: Nick Maclaren [mailto:nmm1 at cus.cam.ac.uk] 
> Sent: Friday, January 13, 2006 12:01 PM
> To: C++ threads standardisation
> Cc: Boehm, Hans; cxxpanel at yahoogroups.co.uk
> Subject: Comments on Hans's latest strawman proposal
> 
> 
> I don't think that followups need the CCs.  I am including 
> the UK C++ panel, because some people may like to see the second half.
> 
> > The occurs-before relation
> > 
> > If a memory update or side-effect a occurs before another memory 
> > operation or side-effect b, then informally a must appear to be 
> > completely evaluated before b in the sequential execution 
> of a single 
> > thread, e.g. all side effects of a must occur before those 
> of b. This 
> > notion does not directly imply anything about the order in which 
> > memory updates become visible to other threads.
> 
> This is incomplete: "e.g. all side effects of a must occur 
> before those of b" should be replaced by "e.g. all accesses 
> and all side effects of a must occur before those of b".
Thanks.  Fixed.

> > The depends-on relation
> > 
> > Consider a given execution of a particular thread, i.e. the 
> sequence 
> > of actions that may be performed by a particular thread as 
> part of the 
> > execution of the containing program. If, as a result of 
> changing only 
> > the value read by an atomic load L, a subsequent atomic 
> store S either 
> > can no longer occur, or must store a different value, then 
> S depends 
> > on L.
> 
> {A}: Do you mean to restrict this to L-then-S and rely on the 
> occurs-before relation for L-then-S?  If so, see later for 
> some changes needed.
Yes.  This basically captures intra-thread dependencies that wouldn't
othrwise be included in happens-before.

> 
> > The synchronizes-with relation
> > 
> > Informally, a lock release synchronizes-with the next 
> acquisition of 
> > the same lock. A barrier (in the pthread_barrier_wait sense 
> or OpenMP 
> > sense) synchronizes-with all corresponding executions of 
> the barrier 
> > in other threads. A memory fence (or barrier in the other sense) 
> > synchronizes-with the next execution of the barrier, usually by 
> > another thread. An atomic store synchronizes-with all atomic loads 
> > that read the value saved by the store.
> 
> Be careful to ensure that a lock acquisition 
> synchronizes-with a consequential failure to acquire the lock 
> because it is in use (but not for any other reason).  POSIX 
> has got this one wrong.
In this model, I agree that's how things should work.  I'm not at all
sure that the intra-thread ordering constraint for lock acquisition
should imply release semantics, as it currently appears to.  But that's
a different question.

> 
> > The happens-before relation
> > 
> >    * If A and B are ordinary (not atomic!) memory references, and A
> >      occurs-before B, then A happens-before B.
> 
> {A}: You can't mean this!  The "(not atomic!)" needs to be 
> scrapped. Without doing so, there is nothing in this draft 
> that provides any happens-before relation between atomic and 
> ordinary references.
The second clause "If A is an atomic reference ..." gives those
constraints.  You are correct that there is not necessarily a
happens-before ordering between atomic and ordinary references from the
same thread; it depends on the ordering constraints associated with the
atomic reference.

> 
> >    * I an atomic store B, depends-on an earlier atomic load 
> A in the same
> >      thread, then A happens before B. In particular, assuming all
> >      assignments are atomic operations, there is no 
> happens-before ordering
> >      between the load and the store in r1 = x; y = 1, but 
> there is an
> >      ordering between such an ordering between the two 
> halves of r1 = x; y =
> >      r1.
> 
> {A}: See above about the wording of depends-on.  This is 
> S-then-L on r1, not L-then-S.
Sorry.  I'm referring to the load of x and the store of y.  I made that
more explicit.

> 
> > Consistent executions
> > 
> >   1. The actions of any particular thread (excluding values 
> read from
> >      potentially shared locations), and the corresponding 
> occurs-before
> >      relation, are consistent with the normal sequential 
> semantics as given
> >      in the rest of the standard.
> 
> Gug.  That will offer plenty of scope for perverse interpretations.
As we converge on the standardese, we should try to improve that. I'm
not sure we can eliminate the problem completely, but I'm also not sure
that it's a major practical problem.

> 
> >   2. The synchronizes-with relation is consistent with the 
> constraints
> >      imposed by the definitions of the synchronization 
> primitives. For
> >      example, if S is an atomic store which 
> synchronizes-with an atomic load
> >      L, then the loaded and stored values must be the same.
> 
> I don't understand what you mean by this, and whether it is 
> constraining the program, implementation or what.
It effectively constrains a consistent execution to be consistent with
the definition of the synchronization primitives (which is unfortunately
still TBD).  Effectively that constrains the implementation of the
synchronization primitives.

> 
> >   3. (intra-thread visibility) If a load L sees a store S 
> from the same
> >      thread, then L must not occur-before S, and there must be no
> >      intervening store S' such that S occurs-before S' and 
> S' occurs-before
> >      L.
> 
> Don't you mean "If a load L sees a store S from the same 
> thread, then S must occur-before L"?
No, because that would make it harder to define an "intra-thread race".
Executions are defined such that if I write f(++i, ++i), each load of i
is allowed to see several possible stores, and hence there is an
"intra-thread race", and hence the program has undefined semantics.

> 
> > Data races
> > 
> > We define a memory location to be a variable, (non-bitfield) data 
> > member, array element, or contiguous sequence of bitfields. 
> We define 
> > two actions to be conflicting if they access the same 
> memory location, 
> > and one of them is a store access.
> 
> Thus bringing in my Objects diatribe.  While that is a 
> semi-orthogonal aspect, it does need attention.
> 
> > [ Bill Pugh points out that that the notion of "input" here 
> isn't well 
> > defined for a program that interacts with its environment. And we 
> > don't want to give undefined semantics to a program just 
> because there 
> > is some other sequence of interactions with the environment that 
> > results in a data race. We probably want something more along the 
> > lines of stating that every program behavior either
> > 
> >   1. corresponds to a consistent execution in which loads 
> see stores that
> >      happen-before them, or
> >   2. there is a consistent execution triple with a data 
> race, such that
> >      calls to library IO functions before the data race are 
> consistent with
> >      observed behavior.
> 
> Yes.  I think that it can be done without too much pain.  See 
> at the end.
> 
> > This gets us a bit closer to the Java causality model. But I'm not 
> > sure we need much of its complexity. I think the notion of 
> "before" in 
> > the second clause is easily definable, since we can insist that IO 
> > operations be included in the effectively total order of ordinary 
> > variable accesses. ]
> 
> It would be extremely harmful to be so restrictive, and would 
> cripple performance.  POSIX is bad enough, but this is an 
> order of magnitude worse.  Please ask me for an expansion of 
> the issue if this is unclear.
I'm not quite sure what you mean here.  Remember that consistent
executions exist primarily to define when something is data-race-free.
We're not really insisting on totally ordering ordinary variable
accesses, since we require that no conforming program can tell the
difference.

Let's postpone some of the later discussion, since those sections of the
proposal really still need to be rewritten.

> > Here we list some more detailed implications of the last statement:
> > 
> >    * Structure or class data member assignments may not be 
> implemented in a
> >      way that overwrites data members not assigned to in 
> the source, unless
> >      the assigned and overwritten members are adjacent bit-fields.
> 
> Gug.  How clear is C++ that padding bytes may not be written 
> to?  There was a long and inconclusive debate about whether C 
> allows them to be. This may need clarification.
Why does it matter?

> 
> There is a MUCH nastier problem with constructions like the 
> following (again, using C syntax):
> 
> static double d;
> char *p = &((char *)d)[1];
> *(char *)d = 0;
Agreed, though I suspect this isn't really controversial.  Anybody wnt
to propose a way to handle this?

> 
> > Volatile variables and data members
> > 
> > Since the value of a __async volatile can affect control 
> flow and thus 
> > determine whether or not other shared variables are accessed, this 
> > implies that ordinary memory operations cannot in general 
> be reordered 
> > with respect to a __async volatile access. It also means 
> that __async 
> > volatile accesses are sequentially consistent.
> 
> {A}: see above.  I agree that this is desirable, just not 
> that the current wording states it.
> 
> > [ This section appears very controversial, and may be 
> withdrawn. The 
> > alternative is to always rely on the atomic operations library. It 
> > seems to make sense to put this on hold until we have a 
> better handle 
> > on the atomic operations library, so that we can tell whether that 
> > would be a major inconvenience.
> 
> I don't see that it helps.  The real issue is the interaction 
> between atomic and ordinary accesses, and not the minor one 
> of the syntax used for the former.
> 
> > [We can't talk about async-signal-safety here. We might 
> suggest that 
> > __async volatile int and __async volatile pointers be 
> > async-signal-safe where that's possible and meaningful. My concern 
> > here is with uniprocessor embedded platforms, which might 
> have to use 
> > restartable critical sections to implement atomicity, and might 
> > misalign things. }
> 
> I really don't understand this.  Yes, I know that people 
> nowadays rarely don't understand asynchronous signals, but 
> there AREN'T any extra problems as far as simple memory 
> updates go!  If the implementation has to use critical 
> sections to implement atomicity, it can use exactly the same 
> logic to implement async-signal-safety.  Been there - done that.
> 
> 
> 
> 
> 
> Here is a draft of what I feel could (and should) be added 
> for the non-memory aspects.  I have tried to specify things 
> in such a way that they map to the equivalent memory 
> concepts, so that it will add few extra problems and will 
> adapt to changes in the main memory model.  Its terminology 
> is all wrong, because of my lack of knowledge of how to use
> C++, but I am concentrating on the concepts.
> 
> 
> 
> Intra-Process Non-Memory Actions (Including Hidden State, 
> Exceptions etc.)
> --------------------------------------------------------------
> ------------
> 
> Unless qualified in subsequent paragraphs, all non-memory 
> actions have occurs-before constraints and relations as if 
> they were memory actions of a similar class.  Examples:
> 
>     set_terminate behaves as if it updates a hidden memory 
> location, and
>     a call to that handler behaves as if it reads that location
> 
>     setlocale behaves as if it updates a hidden memory location, and a
>     use of the default locale behaves as if it reads that location
> 
>     set_new_handler behaves as if it updates a hidden memory
>     location, and a call to that handler behaves as if it reads that
>     location
> 
>     throw behaves as if it updates a hidden memory location, and
>     catching that exception behaves as if it reads that location
> 
> Where it is not otherwise specified, such an action or access 
> behaves as if it is an access to a shared object solely 
> within the calling thread, and there is no synchronizes-with 
> action implied by the action.  The normal rules apply when 
> deciding whether a program execution is consistent.
Anyone object to this?  That sounds reasonable to me.

> 
> C++ does not currently define any intra-process out-of-band exceptions
> (in the normal computer science sense, not necessarily 
> restricted to C++ exceptions).  However, it permits some 
> optional exceptions to be thrown out-of-band in some 
> implementations (especially arithmetic ones).  Where an 
> implementation supports the catching of such exceptions, it 
> should attempt to do the following for intra-thread 
> out-of-band exceptions and shall document whether it does so.
> 
> [ This is not exactly an onerous requirement, but is the 
> minimum that is useful and the maximum that can always be 
> implemented.  I give two forms, either of which could be 
> used; the former uses an explicit synchronization function 
> and the latter hijacks the inter-thread ones. ]
> 
> EITHER:
> 
> If a thread executes two synchronizes-locally actions A and 
> B, an action C throws a intra-thread out-of-band exception at 
> 'time' D, A occurs before C and C occurs before B, then A 
> shall occur before D and D shall occur before B.  As far as 
> this constraint is concerned, there shall be a 
> synchronizes-locally action immediately following program 
> initialization and one immediately preceding normal program 
> termination (but not necessarily that caused by calls to the 
> abort or terminate functions).  The synchronizes-locally 
> action shall be caused by a call to the barrier function in 
> class exception.
> 
> OR:
> 
> If a thread executes either side of two synchronizes-with 
> actions A and B with any other thread, an action C throws a 
> intra-thread out-of-band exception at 'time' D, A occurs 
> before C and C occurs before B, then A shall occur before D 
> and D shall occur before B.  As far as this constraint is 
> concerned, there shall be an anonymous synchronizes-with 
> action immediately following program initialization and one 
> immediately preceding normal program termination (but not 
> necessarily that caused by calls to the abort or terminate functions).
> 
> [END OF ALTERNATIVE]
Doesn't this force exceptions to be more precise than they usually are,
e.g. prevent many kinds of loop pipelining?  Unless I'm misunderstanding
things here, this is going to generate controversy here for two reasons:

1) I suspect it's a tradeoff between optimization and usability of fp
exceptions, and

2) I'm not sure it makes sense in the context of the C++ standard,
without also defining some of these exceptions

Since it also doesn't seem to be directly related to threads, I'm not
sure we really want to go there as part of this poroposal.
> 
> [ PROBLEM: God help me, POSIX permits things like SIGFPE to 
> be raised in a thread other than the one where the arithmetic 
> exception occurred. That being so, there will be considerable 
> political pressure to permit
> C++ exception handling to permit that, because otherwise there will be
> systems on which C++ cannot implement arithmetic exception 
> handling in conjunction with threads.  The following should 
> do the trick. ]
> 
> Where an exception is caused by an action in one thread but 
> thrown in another, it shall be treated as as if it occurs in 
> two stages, one in each thread: the action shall cause a 
> hidden memory location to be updated, and the throwing of the 
> exception shall result from an asynchronous read of that 
> location.  Both stages shall have ordering semantics as 
> described above, with the resulting constraints on consistent 
> execution.  An implementation shall document when this may occur.
> 
> RECOMMENDATION: an implementation should avoid throwing an 
> exception in a different thread from the action that causes 
> it, where possible.
> 
> RECOMMENDATION: an implementation should attempt to follow 
> these guidelines for other forms of handling of intra-process 
> out-of-band exceptions (in the general sense), such as the 
> use of POSIX signal handling for signals that are both 
> generated and trapped within the process.
> 
> Actions on streams and other I/O are described separately.
I think that if we wanted to say anything about this, it would be to
forbid this behavior altogether.  This kind of synchronous exception
should be handled by the thread that generated it.  My guess is that was
also the Posix intent, though it may not be clear.  I'm not sure that
it's our job to fix their bugs.

Aside from the signal handling issues, which are tricky, I agree with
the deleted text that followed, in that I/O actions between threads
should ensure visibility.  I'm not sure whether that's controversial; I
expect it's more of on oversight in some of the other specifications.