[cpp-threads] Slightly revised memory model proposal (D2300)

Fri Jun 22 21:38:51 BST 2007

> -----Original Message-----
> From:  Paul E. McKenney
> 
> ...
> >       This avoids some concerns about synchronization 
> elimination. In
> >       particular, the old formulation allowed (everything initially
> >       zero, atomic):
> > 
> >       Thread1 	Thread2
> >       r1 = x.load_relaxed(); // yields 1
> >       y.store_relaxed(1); 	r1 = y.load_relaxed(); // yields 1
> >       x.store_relaxed(1);
> > 
> >       but disallowed the corresponding test case if the two 
> statements in
> >       each thread were separated by acq_rel 
> read-modify-write operations
> >       to dead variables, or a locked region. That interferes with
> >       the elimination of locks if the compiler decides to statically
> >       combine threads, which is likely to become important 
> for certain
> >       programming styles.
> 
> But doesn't the above commentary go with the previous point 
> ("Switched to a more conventional...")?
Correct.  Thanks.  I fixed it.
> [ . . . ]
> 
> >     An evaluation A that performs a release operation on an object
> >     M synchronizes with an evaluation B that performs an acquire
> >     operation on M and reads either the value written by A or, if
> >     the following (in modification order) sequence of updates to M
> >     are atomic read-modify-write operations or ordered 
> atomic stores,
> 
> Again, I believe that this should instead read "non-relaxed 
> read-modify-write operations" in order to allow some common 
> data-element initialization optimizations.
Could you explain in a bit more detail?  It seems weird to me that an
intervening fetch_add_relaxed(0) would break "synchronizes with"
relationships, but fetch_add_acquire(0) would not.  Acq_rel and seq_cst
inherently do not have this issue, but the others all seem to me like
they should be treated identically.

> 
> >     a value written by one of these read-modify-write operations or
> >     ordered stores. [Note: In some cases, particularly for 
> sequentially
> >     consistent atomic operations, the library specification 
> strengthens
> >     this to acquire operations that read any later (in modification
> >     order) value. Except in those cases, reading a later 
Rereading this, I think this part of the note can now go away again ...

> value does not
> >     necessarily ensure visibility as described below. Such 
> a requirement
> >     would sometimes interfere with efficient 
> implementation. -end note ]
> >     [Note: The specifications of the synchronization 
> operations define
> >     when one reads the value written by another. For atomic 
> variables,
> >     the definition is clear. All operations on a given lock 
> occur in a
> >     single total order. Each lock acquisition "reads the 
> value written"
> >     by the last lock release. -end note ]
> 
> And explicit memory fences now fit naturally into the 
> "synchronizes with"
> definition, perhaps something like the following:
> 
> 	An evaluation A synchronizes with an evaluation D in presence
> 	of acquire and release memory fences if (1) the evaluation
> 	A is sequenced before a fence with acquire semantics which
> 	is sequenced before an evaluation B, (2) an evaluation C
> 	is sequenced before a fence with release semantics which
> 	is sequenced before the evaluation D, and (3) evaluation
> 	C reads from an object M either the value written by B or
> 	some following value (in modification order), but only in
> 	the case that all intervening modifications to M have been
> 	carried out by non-relaxed read-modify-write operations.
> 	[Note: using N2262 nomenclature, acquire_memory_fence(),
> 	acq_rel_memory_fence(), and ordered_memory_fence() have acquire
> 	semantics, while release_memory_fence(), acq_rel_memory_fence(),
> 	and ordered_memory_fence() have release semantics. ---end note]
> 
> In other words, manual insertion of acquire and release 
> fences have the same semantics as the equivalent acquire and 
> release operations.
> 
> However, this does not capture all of the much stronger 
> semantics of ordered fences (e.g., Dekker and IRIW).  One way 
> to capture this might be as follows, though this does need 
> more thought:
> 
> 	An evaluation A synchronizes with an evaluation D in presence of
> 	ordered memory fences if (1) the evaluation A is 
> sequenced before
> 	an ordered fence which is sequenced before an evaluation B, (2)
> 	an evaluation C is sequenced before an ordered fence which is
> 	sequenced before the evaluation D, and (3) evaluation B precedes
> 	evaluation C with respect to some memory object M's modification
> 	order.	Evaluation B can precede C in any of the following
> 	ways, all with respect to M's modification order: (a) C reads a
> 	value from M that is either written by B or some later value,
> 	(b) C writes a value to M that follows that read from M by B,
> 	(c) C writes a value to M that follows that written to M by B,
> 	or (d) C reads a value from M that follows that read 
> from M by B.
> 
> Peter, Raul, thoughts?
Good.

I'm increasingly convinced that Lawrence's approach could also work
here, and is also now fairly easy to integrate.  I don't think that we
will resolve this issue before Toronto.  It's subtle enough that it
probably requires us all to be in the same room.  The scenario that I
would like to aim for is:

- We vote the basic memory model text (N2300), possibly with some small
changes, into the working paper in Toronto.  I suspect that's nearly
required to have a chance at propagating the effects into some of the
library text by Kona.

- We agree on exactly what we want in terms of atomics and fences in
Toronto, so that it can be written up as formal standardese ideally in
the post-Toronto mailing.

> 
> > 1.10p10:
> 
> [ . . . ]
> 
> >     A multi-threaded execution is consistent if each thread observes
> >     values of objects that obey the following constraints:
> > 
> >         * No evaluation happens before itself.
> >         * No read access B to a scalar object observes a 
> side effect A,
> > 	  such that B happens before A.
> >         * Each read access B to a scalar object M observes the value
> > 	  assigned to M by a side effect A only if there is no other
> > 	  side effect X to M such that
> >               o A happens before X, and
> >               o X happens before B. 
> >         * If a side effect A to scalar object M happens 
> before another
> > 	    side effect B to the same scalar object M, then A 
> must precede
> > 	    B in Ms modification order.
> >         * If read access A to scalar B object M observes a 
> value a, and
> > 	    A happens before read access B to the same scalar 
> object M which
> > 	    observes b, then the corresponding assignment of b to M may
> > 	    not precede the assignment of a to M in M's 
> modification order.
> 
> My guess is that "scalar B object M" needs to instead be 
> "scalar object M".
Yes.  Fixed.

> This assumes b!=a, so shouldn't this be made explicit?
I think I'm a bit more comfortable with the way it is.  If a == b, and
the assignments mentioned are the same, I think the statement is
vacuously true.  But we may have a == b, but both read accesses in fact
see different assignments, which happen to store the same value.  In
that case, I expect we still want the rule to apply.

This gets a bit confusing, since in the mathematical version of this
description, we would have an explicit pairing of reads and writes,
which we are kind of sweeping under the rug here to get something that
sounds more like standardese.

> 
> On the "scalar" modifier used here, this means that we are 
> not proposing atomic access to arbitrary structures, for 
> example, atomic<struct foo>? 
> (Fine by me, but need to know.)  However, we -are- requiring 
> that implementations provide atomic access to large scalars, 
> correct?  So that a conforming 8-bit system would need to 
> provide proper semantics for "atomic<long long>"?  Not that 
> there are likely to be many parallel 8-bit systems, to be 
> sure, but same question for 32-bit systems and both long long 
> and double.  (Again, fine by me either way, but need to know.)
In my view, most of this discussion really belongs in the atomics
proposal.  It should specify that if a read to an atomic object observes
any of the effect of an atomic write to an object, then it observes all
of it.  And that applies to objects beyond scalars.  But it's just an
added constraint beyond what is specified in the memory model section.

Our atomics proposal still proposes to guarantee atomicity for atomic<T>
for any T.  We just don't promise to do it in a lock-free manner.  Thus
atomic<struct foo> does behave atomically with respect to threads.

> 
> Although the general thrust of 1.10p8 and 1.10p10 look OK to 
> me, more careful analysis will be required -- which won't 
> happen by your deadline, sorry to say!
Thanks for looking at this quickly.  If we need to make some more tweaks
before Toronto, I think that's fine.

I think at the moment, the real constraint on getting this into the
working paper is that we need to have more people reading it.

> 
> 						Thanx, Paul
> 
> --
> cpp-threads mailing list
> cpp-threads at decadentplace.org.uk
> http://www.decadentplace.org.uk/cgi-bin/mailman/listinfo/cpp-threads
>