[cpp-threads] Slightly revised memory model proposal (D2300)
Paul E. McKenney
paulmck at linux.vnet.ibm.com
Fri Jun 22 23:34:24 BST 2007
On Fri, Jun 22, 2007 at 08:38:51PM -0000, Boehm, Hans wrote:
> > -----Original Message-----
> > From: Paul E. McKenney
[ . . . ]
> > > An evaluation A that performs a release operation on an object
> > > M synchronizes with an evaluation B that performs an acquire
> > > operation on M and reads either the value written by A or, if
> > > the following (in modification order) sequence of updates to M
> > > are atomic read-modify-write operations or ordered
> > atomic stores,
> >
> > Again, I believe that this should instead read "non-relaxed
> > read-modify-write operations" in order to allow some common
> > data-element initialization optimizations.
>
> Could you explain in a bit more detail? It seems weird to me that an
> intervening fetch_add_relaxed(0) would break "synchronizes with"
> relationships, but fetch_add_acquire(0) would not. Acq_rel and seq_cst
> inherently do not have this issue, but the others all seem to me like
> they should be treated identically.
The proper analogy is between fetch_add_acquire() and fetch_add_relaxed
on the on hand and load_acquire() and load_relaxed() on the other.
In both cases, the _acquire() variant participates in "synchronizes with"
relationships, while the _relaxed() variant does not.
Looking at the argument to fetch_add_relaxed(), we have
fetch_add_relaxed(0) breaking the "synchronizes with" relationship,
but then again, so would fetch_add_relaxed(1), fetch_add_relaxed(2),
and even fetch_add_relaxed(42).
Or am I missing something subtle here?
> > > a value written by one of these read-modify-write operations or
> > > ordered stores. [Note: In some cases, particularly for
> > sequentially
> > > consistent atomic operations, the library specification
> > strengthens
> > > this to acquire operations that read any later (in modification
> > > order) value. Except in those cases, reading a later
>
> Rereading this, I think this part of the note can now go away again ...
Makes sense to me!
> > value does not
> > > necessarily ensure visibility as described below. Such
> > a requirement
> > > would sometimes interfere with efficient
> > implementation. -end note ]
> > > [Note: The specifications of the synchronization
> > operations define
> > > when one reads the value written by another. For atomic
> > variables,
> > > the definition is clear. All operations on a given lock
> > occur in a
> > > single total order. Each lock acquisition "reads the
> > value written"
> > > by the last lock release. -end note ]
> >
> > And explicit memory fences now fit naturally into the
> > "synchronizes with"
> > definition, perhaps something like the following:
> >
> > An evaluation A synchronizes with an evaluation D in presence
> > of acquire and release memory fences if (1) the evaluation
> > A is sequenced before a fence with acquire semantics which
> > is sequenced before an evaluation B, (2) an evaluation C
> > is sequenced before a fence with release semantics which
> > is sequenced before the evaluation D, and (3) evaluation
> > C reads from an object M either the value written by B or
> > some following value (in modification order), but only in
> > the case that all intervening modifications to M have been
> > carried out by non-relaxed read-modify-write operations.
> > [Note: using N2262 nomenclature, acquire_memory_fence(),
> > acq_rel_memory_fence(), and ordered_memory_fence() have acquire
> > semantics, while release_memory_fence(), acq_rel_memory_fence(),
> > and ordered_memory_fence() have release semantics. ---end note]
> >
> > In other words, manual insertion of acquire and release
> > fences have the same semantics as the equivalent acquire and
> > release operations.
> >
> > However, this does not capture all of the much stronger
> > semantics of ordered fences (e.g., Dekker and IRIW). One way
> > to capture this might be as follows, though this does need
> > more thought:
> >
> > An evaluation A synchronizes with an evaluation D in presence of
> > ordered memory fences if (1) the evaluation A is
> > sequenced before
> > an ordered fence which is sequenced before an evaluation B, (2)
> > an evaluation C is sequenced before an ordered fence which is
> > sequenced before the evaluation D, and (3) evaluation B precedes
> > evaluation C with respect to some memory object M's modification
> > order. Evaluation B can precede C in any of the following
> > ways, all with respect to M's modification order: (a) C reads a
> > value from M that is either written by B or some later value,
> > (b) C writes a value to M that follows that read from M by B,
> > (c) C writes a value to M that follows that written to M by B,
> > or (d) C reads a value from M that follows that read
> > from M by B.
> >
> > Peter, Raul, thoughts?
>
> Good.
>
> I'm increasingly convinced that Lawrence's approach could also work
> here, and is also now fairly easy to integrate. I don't think that we
> will resolve this issue before Toronto. It's subtle enough that it
> probably requires us all to be in the same room. The scenario that I
> would like to aim for is:
>
> - We vote the basic memory model text (N2300), possibly with some small
> changes, into the working paper in Toronto. I suspect that's nearly
> required to have a chance at propagating the effects into some of the
> library text by Kona.
>
> - We agree on exactly what we want in terms of atomics and fences in
> Toronto, so that it can be written up as formal standardese ideally in
> the post-Toronto mailing.
I cannot deny that past discussions on this topic have at times been
spirited, but I must defer to Michael and Raul on how best to move this
through the process.
> > > 1.10p10:
> >
> > [ . . . ]
> >
> > > A multi-threaded execution is consistent if each thread observes
> > > values of objects that obey the following constraints:
> > >
> > > * No evaluation happens before itself.
> > > * No read access B to a scalar object observes a
> > side effect A,
> > > such that B happens before A.
> > > * Each read access B to a scalar object M observes the value
> > > assigned to M by a side effect A only if there is no other
> > > side effect X to M such that
> > > o A happens before X, and
> > > o X happens before B.
> > > * If a side effect A to scalar object M happens
> > before another
> > > side effect B to the same scalar object M, then A
> > must precede
> > > B in Ms modification order.
> > > * If read access A to scalar B object M observes a
> > value a, and
> > > A happens before read access B to the same scalar
> > object M which
> > > observes b, then the corresponding assignment of b to M may
> > > not precede the assignment of a to M in M's
> > modification order.
> >
> > My guess is that "scalar B object M" needs to instead be
> > "scalar object M".
>
> Yes. Fixed.
>
> > This assumes b!=a, so shouldn't this be made explicit?
>
> I think I'm a bit more comfortable with the way it is. If a == b, and
> the assignments mentioned are the same, I think the statement is
> vacuously true. But we may have a == b, but both read accesses in fact
> see different assignments, which happen to store the same value. In
> that case, I expect we still want the rule to apply.
Interesting point, though I cannot see how one would validate that
this rule was actually followed without using in-circuit emulators.
> This gets a bit confusing, since in the mathematical version of this
> description, we would have an explicit pairing of reads and writes,
> which we are kind of sweeping under the rug here to get something that
> sounds more like standardese.
In any case, I do not feel strongly about this, so am willing to go
either way.
> > On the "scalar" modifier used here, this means that we are
> > not proposing atomic access to arbitrary structures, for
> > example, atomic<struct foo>?
> > (Fine by me, but need to know.) However, we -are- requiring
> > that implementations provide atomic access to large scalars,
> > correct? So that a conforming 8-bit system would need to
> > provide proper semantics for "atomic<long long>"? Not that
> > there are likely to be many parallel 8-bit systems, to be
> > sure, but same question for 32-bit systems and both long long
> > and double. (Again, fine by me either way, but need to know.)
>
> In my view, most of this discussion really belongs in the atomics
> proposal. It should specify that if a read to an atomic object observes
> any of the effect of an atomic write to an object, then it observes all
> of it. And that applies to objects beyond scalars. But it's just an
> added constraint beyond what is specified in the memory model section.
>
> Our atomics proposal still proposes to guarantee atomicity for atomic<T>
> for any T. We just don't promise to do it in a lock-free manner. Thus
> atomic<struct foo> does behave atomically with respect to threads.
In that case, my question becomes "why is the qualifier 'scalar'
required?".
> > Although the general thrust of 1.10p8 and 1.10p10 look OK to
> > me, more careful analysis will be required -- which won't
> > happen by your deadline, sorry to say!
>
> Thanks for looking at this quickly. If we need to make some more tweaks
> before Toronto, I think that's fine.
>
> I think at the moment, the real constraint on getting this into the
> working paper is that we need to have more people reading it.
More eyes would indeed be a very good thing from my perspective.
Thanx, Paul
More information about the cpp-threads
mailing list