[cpp-threads] Slightly revised memory model proposal (D2300)

Fri Jun 22 23:34:24 BST 2007

On Fri, Jun 22, 2007 at 08:38:51PM -0000, Boehm, Hans wrote:
> > -----Original Message-----
> > From:  Paul E. McKenney

[ . . . ]

> > >     An evaluation A that performs a release operation on an object
> > >     M synchronizes with an evaluation B that performs an acquire
> > >     operation on M and reads either the value written by A or, if
> > >     the following (in modification order) sequence of updates to M
> > >     are atomic read-modify-write operations or ordered 
> > atomic stores,
> > 
> > Again, I believe that this should instead read "non-relaxed 
> > read-modify-write operations" in order to allow some common 
> > data-element initialization optimizations.
>
> Could you explain in a bit more detail?  It seems weird to me that an
> intervening fetch_add_relaxed(0) would break "synchronizes with"
> relationships, but fetch_add_acquire(0) would not.  Acq_rel and seq_cst
> inherently do not have this issue, but the others all seem to me like
> they should be treated identically.

The proper analogy is between fetch_add_acquire() and fetch_add_relaxed
on the on hand and load_acquire() and load_relaxed() on the other.
In both cases, the _acquire() variant participates in "synchronizes with"
relationships, while the _relaxed() variant does not.

Looking at the argument to fetch_add_relaxed(), we have
fetch_add_relaxed(0) breaking the "synchronizes with" relationship,
but then again, so would fetch_add_relaxed(1), fetch_add_relaxed(2),
and even fetch_add_relaxed(42).

Or am I missing something subtle here?

> > >     a value written by one of these read-modify-write operations or
> > >     ordered stores. [Note: In some cases, particularly for 
> > sequentially
> > >     consistent atomic operations, the library specification 
> > strengthens
> > >     this to acquire operations that read any later (in modification
> > >     order) value. Except in those cases, reading a later 
>
> Rereading this, I think this part of the note can now go away again ...

Makes sense to me!

> > value does not
> > >     necessarily ensure visibility as described below. Such 
> > a requirement
> > >     would sometimes interfere with efficient 
> > implementation. -end note ]
> > >     [Note: The specifications of the synchronization 
> > operations define
> > >     when one reads the value written by another. For atomic 
> > variables,
> > >     the definition is clear. All operations on a given lock 
> > occur in a
> > >     single total order. Each lock acquisition "reads the 
> > value written"
> > >     by the last lock release. -end note ]
> > 
> > And explicit memory fences now fit naturally into the 
> > "synchronizes with"
> > definition, perhaps something like the following:
> > 
> > 	An evaluation A synchronizes with an evaluation D in presence
> > 	of acquire and release memory fences if (1) the evaluation
> > 	A is sequenced before a fence with acquire semantics which
> > 	is sequenced before an evaluation B, (2) an evaluation C
> > 	is sequenced before a fence with release semantics which
> > 	is sequenced before the evaluation D, and (3) evaluation
> > 	C reads from an object M either the value written by B or
> > 	some following value (in modification order), but only in
> > 	the case that all intervening modifications to M have been
> > 	carried out by non-relaxed read-modify-write operations.
> > 	[Note: using N2262 nomenclature, acquire_memory_fence(),
> > 	acq_rel_memory_fence(), and ordered_memory_fence() have acquire
> > 	semantics, while release_memory_fence(), acq_rel_memory_fence(),
> > 	and ordered_memory_fence() have release semantics. ---end note]
> > 
> > In other words, manual insertion of acquire and release 
> > fences have the same semantics as the equivalent acquire and 
> > release operations.
> > 
> > However, this does not capture all of the much stronger 
> > semantics of ordered fences (e.g., Dekker and IRIW).  One way 
> > to capture this might be as follows, though this does need 
> > more thought:
> > 
> > 	An evaluation A synchronizes with an evaluation D in presence of
> > 	ordered memory fences if (1) the evaluation A is 
> > sequenced before
> > 	an ordered fence which is sequenced before an evaluation B, (2)
> > 	an evaluation C is sequenced before an ordered fence which is
> > 	sequenced before the evaluation D, and (3) evaluation B precedes
> > 	evaluation C with respect to some memory object M's modification
> > 	order.	Evaluation B can precede C in any of the following
> > 	ways, all with respect to M's modification order: (a) C reads a
> > 	value from M that is either written by B or some later value,
> > 	(b) C writes a value to M that follows that read from M by B,
> > 	(c) C writes a value to M that follows that written to M by B,
> > 	or (d) C reads a value from M that follows that read 
> > from M by B.
> > 
> > Peter, Raul, thoughts?
>
> Good.
> 
> I'm increasingly convinced that Lawrence's approach could also work
> here, and is also now fairly easy to integrate.  I don't think that we
> will resolve this issue before Toronto.  It's subtle enough that it
> probably requires us all to be in the same room.  The scenario that I
> would like to aim for is:
> 
> - We vote the basic memory model text (N2300), possibly with some small
> changes, into the working paper in Toronto.  I suspect that's nearly
> required to have a chance at propagating the effects into some of the
> library text by Kona.
> 
> - We agree on exactly what we want in terms of atomics and fences in
> Toronto, so that it can be written up as formal standardese ideally in
> the post-Toronto mailing.

I cannot deny that past discussions on this topic have at times been
spirited, but I must defer to Michael and Raul on how best to move this
through the process.

> > > 1.10p10:
> > 
> > [ . . . ]
> > 
> > >     A multi-threaded execution is consistent if each thread observes
> > >     values of objects that obey the following constraints:
> > > 
> > >         * No evaluation happens before itself.
> > >         * No read access B to a scalar object observes a 
> > side effect A,
> > > 	  such that B happens before A.
> > >         * Each read access B to a scalar object M observes the value
> > > 	  assigned to M by a side effect A only if there is no other
> > > 	  side effect X to M such that
> > >               o A happens before X, and
> > >               o X happens before B. 
> > >         * If a side effect A to scalar object M happens 
> > before another
> > > 	    side effect B to the same scalar object M, then A 
> > must precede
> > > 	    B in Ms modification order.
> > >         * If read access A to scalar B object M observes a 
> > value a, and
> > > 	    A happens before read access B to the same scalar 
> > object M which
> > > 	    observes b, then the corresponding assignment of b to M may
> > > 	    not precede the assignment of a to M in M's 
> > modification order.
> > 
> > My guess is that "scalar B object M" needs to instead be 
> > "scalar object M".
>
> Yes.  Fixed.
> 
> > This assumes b!=a, so shouldn't this be made explicit?
>
> I think I'm a bit more comfortable with the way it is.  If a == b, and
> the assignments mentioned are the same, I think the statement is
> vacuously true.  But we may have a == b, but both read accesses in fact
> see different assignments, which happen to store the same value.  In
> that case, I expect we still want the rule to apply.

Interesting point, though I cannot see how one would validate that
this rule was actually followed without using in-circuit emulators.

> This gets a bit confusing, since in the mathematical version of this
> description, we would have an explicit pairing of reads and writes,
> which we are kind of sweeping under the rug here to get something that
> sounds more like standardese.

In any case, I do not feel strongly about this, so am willing to go
either way.

> > On the "scalar" modifier used here, this means that we are 
> > not proposing atomic access to arbitrary structures, for 
> > example, atomic<struct foo>? 
> > (Fine by me, but need to know.)  However, we -are- requiring 
> > that implementations provide atomic access to large scalars, 
> > correct?  So that a conforming 8-bit system would need to 
> > provide proper semantics for "atomic<long long>"?  Not that 
> > there are likely to be many parallel 8-bit systems, to be 
> > sure, but same question for 32-bit systems and both long long 
> > and double.  (Again, fine by me either way, but need to know.)
>
> In my view, most of this discussion really belongs in the atomics
> proposal.  It should specify that if a read to an atomic object observes
> any of the effect of an atomic write to an object, then it observes all
> of it.  And that applies to objects beyond scalars.  But it's just an
> added constraint beyond what is specified in the memory model section.
> 
> Our atomics proposal still proposes to guarantee atomicity for atomic<T>
> for any T.  We just don't promise to do it in a lock-free manner.  Thus
> atomic<struct foo> does behave atomically with respect to threads.

In that case, my question becomes "why is the qualifier 'scalar'
required?".

> > Although the general thrust of 1.10p8 and 1.10p10 look OK to 
> > me, more careful analysis will be required -- which won't 
> > happen by your deadline, sorry to say!
>
> Thanks for looking at this quickly.  If we need to make some more tweaks
> before Toronto, I think that's fine.
> 
> I think at the moment, the real constraint on getting this into the
> working paper is that we need to have more people reading it.

More eyes would indeed be a very good thing from my perspective.

						Thanx, Paul