[cpp-threads] Slightly revised memory model proposal (D2300)
Boehm, Hans
hans.boehm at hp.com
Tue Jun 26 21:44:48 BST 2007
> -----Original Message-----
> From: Paul E. McKenney [mailto:paulmck at linux.vnet.ibm.com]
> Sent: Sunday, June 24, 2007 10:39 AM
> To: Boehm, Hans
> Cc: C++ threads standardisation; Sarita V Adve
> Subject: Re: [cpp-threads] Slightly revised memory model
> proposal (D2300)
>
> On Sun, Jun 24, 2007 at 02:54:53AM -0000, Boehm, Hans wrote:
> > > -----Original Message-----
> > > From: Paul E. McKenney [mailto:paulmck at linux.vnet.ibm.com]
> > > Sent: Friday, June 22, 2007 3:34 PM
> > > To: Boehm, Hans
> > > Cc: C++ threads standardisation; Sarita V Adve
> > > Subject: Re: [cpp-threads] Slightly revised memory model proposal
> > > (D2300)
> > >
> > > On Fri, Jun 22, 2007 at 08:38:51PM -0000, Boehm, Hans wrote:
> > > > > -----Original Message-----
> > > > > From: Paul E. McKenney
> > >
> > > [ . . . ]
> > >
> > > > > > An evaluation A that performs a release operation
> > > on an object
> > > > > > M synchronizes with an evaluation B that
> performs an acquire
> > > > > > operation on M and reads either the value written
> > > by A or, if
> > > > > > the following (in modification order) sequence of
> > > updates to M
> > > > > > are atomic read-modify-write operations or ordered
> > > > > atomic stores,
> > > > >
> > > > > Again, I believe that this should instead read "non-relaxed
> > > > > read-modify-write operations" in order to allow some common
> > > > > data-element initialization optimizations.
> > > >
> > > > Could you explain in a bit more detail? It seems weird to
> > > me that an
> > > > intervening fetch_add_relaxed(0) would break "synchronizes with"
> > > > relationships, but fetch_add_acquire(0) would not. Acq_rel and
> > > > seq_cst inherently do not have this issue, but the others
> > > all seem to
> > > > me like they should be treated identically.
> > >
> > > The proper analogy is between fetch_add_acquire() and
> > > fetch_add_relaxed on the on hand and load_acquire() and
> > > load_relaxed() on the other.
> > > In both cases, the _acquire() variant participates in
> "synchronizes
> > > with"
> > > relationships, while the _relaxed() variant does not.
> > >
> > > Looking at the argument to fetch_add_relaxed(), we have
> > > fetch_add_relaxed(0) breaking the "synchronizes with"
> > > relationship, but then again, so would fetch_add_relaxed(1),
> > > fetch_add_relaxed(2), and even fetch_add_relaxed(42).
> > >
> > > Or am I missing something subtle here?
> >
> > I think we're still misunderstanding each other. The
> situation this
> > is addressing is roughly
> >
> > T1 T2 T3
> > x.store_release(17);
> > x.fetch_add_relaxed(1);
> > x.load_acquire();
> >
> > which we can think of as executing sequentially in that
> order. Thus
> > the load_acquire sees a value of 18. The question is whether the
> > store_release synchronizes with the load_acquire. We agree
> that there
> > are no synchronizes with relationships involving the
> fetch_add (which
> > would change if the fetch_add were anything other than relaxed).
> >
> > Was this also the scenario you were looking at?
>
> I was considering scenarios of this sort, but where the
> compiler realized that it safely could use weaker fences for
> the x.store_release(), but only in absence of an intervening
> synchronizes-with-preserving x.fetch_add_relaxed().
Could you be more specific? In my view, this does result in rather
weird semantics, so I would rather not go there without good motivation.
Consider:
int foo;
atomic<unsigned> n_foo_accesses(0);
T1:
foo = 17;
n_foo_accesses.store_release(1);
T2-N:
if (n_foo_accesses.load_acquire() > 0) {
r1 = foo;
n_foo_accesses.fetch_add_relaxed(1);
}
n_foo_accesses is examined again only after all threads are joined.
It's unclear why this should not work this way, but should work if I
said fetch_add_acquire instead. The difference intuitively seems
irrelevant to me.
>
> > If not, do you want to suggest clearer wording?
>
> Just adding the "non-relaxed" modifier for the RMW operations
> would be fine.
>
> > > > ...would like to aim for is:
> > > >
> > > > - We vote the basic memory model text (N2300), possibly
> with some
> > > > small changes, into the working paper in Toronto. I suspect
> > > > that's nearly required to have a chance at propagating
> the effects
> > > into some
> > > > of the library text by Kona.
> > > >
> > > > - We agree on exactly what we want in terms of atomics and
> > > fences in
> > > > Toronto, so that it can be written up as formal standardese
> > > ideally in
> > > > the post-Toronto mailing.
> > >
> > > I cannot deny that past discussions on this topic have at
> times been
> > > spirited, but I must defer to Michael and Raul on how
> best to move
> > > this through the process.
> >
> > Clearly. And I suspect there will be interesting
> discussions between
> > all of us in Toronto.
> >
> > > > > On the "scalar" modifier used here, this means that
> we are not
> > > > > proposing atomic access to arbitrary structures, for example,
> > > > > atomic<struct foo>?
> > > > > (Fine by me, but need to know.) However, we -are- requiring
> > > > > that implementations provide atomic access to large scalars,
> > > correct? So
> > > > > that a conforming 8-bit system would need to provide proper
> > > > > semantics for "atomic<long long>"? Not that there are
> > > likely to be
> > > > > many parallel 8-bit systems, to be sure, but same
> question for
> > > > > 32-bit systems and both long long and double.
> (Again, fine by
> > > > > me either way, but need to know.)
> > > >
> > > > In my view, most of this discussion really belongs in
> the atomics
> > > > proposal. It should specify that if a read to an atomic object
> > > > observes any of the effect of an atomic write to an
> object, then
> > > > it observes all of it. And that applies to objects beyond
> > > scalars. But
> > > > it's just an added constraint beyond what is specified in
> > > the memory model section.
> > > >
> > > > Our atomics proposal still proposes to guarantee atomicity for
> > > > atomic<T> for any T. We just don't promise to do it in a
> > > > lock-free manner. Thus atomic<struct foo> does behave
> atomically
> > > with respect to threads.
> > >
> > > In that case, my question becomes "why is the qualifier 'scalar'
> > > required?".
> >
> > The intent here is to decompose non-scalar reads into the
> constituent
> > scalar reads, and similarly for nonscalar assignments, in order to
> > determine what assignments can be seen where. Otherwise we
> would have
> > to talk about which combination of field assignments a particular
> > struct read can see. It just seems easier to define
> visibility on a
> > component-by-component basis.
>
> OK. So it is then the library's responsibility to make
> operations on structs appear to be atomic, right?
Correct.
Hans
>
> Thanx, Paul
>
> > Hans
> >
> > >
> > > > > Although the general thrust of 1.10p8 and 1.10p10
> look OK to me,
> > > > > more careful analysis will be required -- which won't
> > > happen by your
> > > > > deadline, sorry to say!
> > > >
> > > > Thanks for looking at this quickly. If we need to make
> some more
> > > > tweaks before Toronto, I think that's fine.
> > > >
> > > > I think at the moment, the real constraint on getting this into
> > > > the working paper is that we need to have more people
> reading it.
> > >
> > > More eyes would indeed be a very good thing from my perspective.
> > >
> > > Thanx, Paul
> > >
>
More information about the cpp-threads
mailing list