[cpp-threads] Slightly revised memory model proposal (D2300)

Tue Jun 26 21:44:48 BST 2007

> -----Original Message-----
> From: Paul E. McKenney [mailto:paulmck at linux.vnet.ibm.com] 
> Sent: Sunday, June 24, 2007 10:39 AM
> To: Boehm, Hans
> Cc: C++ threads standardisation; Sarita V Adve
> Subject: Re: [cpp-threads] Slightly revised memory model 
> proposal (D2300)
> 
> On Sun, Jun 24, 2007 at 02:54:53AM -0000, Boehm, Hans wrote:
> > > -----Original Message-----
> > > From: Paul E. McKenney [mailto:paulmck at linux.vnet.ibm.com]
> > > Sent: Friday, June 22, 2007 3:34 PM
> > > To: Boehm, Hans
> > > Cc: C++ threads standardisation; Sarita V Adve
> > > Subject: Re: [cpp-threads] Slightly revised memory model proposal 
> > > (D2300)
> > > 
> > > On Fri, Jun 22, 2007 at 08:38:51PM -0000, Boehm, Hans wrote:
> > > > > -----Original Message-----
> > > > > From:  Paul E. McKenney
> > > 
> > > [ . . . ]
> > > 
> > > > > >     An evaluation A that performs a release operation
> > > on an object
> > > > > >     M synchronizes with an evaluation B that 
> performs an acquire
> > > > > >     operation on M and reads either the value written
> > > by A or, if
> > > > > >     the following (in modification order) sequence of
> > > updates to M
> > > > > >     are atomic read-modify-write operations or ordered
> > > > > atomic stores,
> > > > > 
> > > > > Again, I believe that this should instead read "non-relaxed 
> > > > > read-modify-write operations" in order to allow some common 
> > > > > data-element initialization optimizations.
> > > >
> > > > Could you explain in a bit more detail?  It seems weird to
> > > me that an
> > > > intervening fetch_add_relaxed(0) would break "synchronizes with"
> > > > relationships, but fetch_add_acquire(0) would not.  Acq_rel and 
> > > > seq_cst inherently do not have this issue, but the others
> > > all seem to
> > > > me like they should be treated identically.
> > > 
> > > The proper analogy is between fetch_add_acquire() and 
> > > fetch_add_relaxed on the on hand and load_acquire() and
> > > load_relaxed() on the other.
> > > In both cases, the _acquire() variant participates in 
> "synchronizes 
> > > with"
> > > relationships, while the _relaxed() variant does not.
> > > 
> > > Looking at the argument to fetch_add_relaxed(), we have
> > > fetch_add_relaxed(0) breaking the "synchronizes with" 
> > > relationship, but then again, so would fetch_add_relaxed(1), 
> > > fetch_add_relaxed(2), and even fetch_add_relaxed(42).
> > > 
> > > Or am I missing something subtle here?
> >
> > I think we're still misunderstanding each other.  The 
> situation this 
> > is addressing is roughly
> > 
> > T1			T2			T3
> > x.store_release(17);
> > 			x.fetch_add_relaxed(1);
> > 						x.load_acquire();
> > 
> > which we can think of as executing sequentially in that 
> order.  Thus 
> > the load_acquire sees a value of 18.  The question is whether the 
> > store_release synchronizes with the load_acquire.  We agree 
> that there 
> > are no synchronizes with relationships involving the 
> fetch_add (which 
> > would change if the fetch_add were anything other than relaxed).
> > 
> > Was this also the scenario you were looking at?
> 
> I was considering scenarios of this sort, but where the 
> compiler realized that it safely could use weaker fences for 
> the x.store_release(), but only in absence of an intervening 
> synchronizes-with-preserving x.fetch_add_relaxed().
Could you be more specific?  In my view, this does result in rather
weird semantics, so I would rather not go there without good motivation.

Consider:

int foo;
atomic<unsigned> n_foo_accesses(0);

T1:
foo = 17;
n_foo_accesses.store_release(1);

T2-N:
if (n_foo_accesses.load_acquire() > 0) {
  r1 = foo;
  n_foo_accesses.fetch_add_relaxed(1);
}

n_foo_accesses is examined again only after all threads are joined.
It's unclear why this should not work this way, but should work if I
said fetch_add_acquire instead.  The difference intuitively seems
irrelevant to me.

> 
> > If not, do you want to suggest clearer wording?
> 
> Just adding the "non-relaxed" modifier for the RMW operations 
> would be fine.
> 
> > > > ...would like to aim for is:
> > > > 
> > > > - We vote the basic memory model text (N2300), possibly 
> with some 
> > > > small changes, into the working paper in Toronto.  I suspect 
> > > > that's nearly required to have a chance at propagating 
> the effects
> > > into some
> > > > of the library text by Kona.
> > > > 
> > > > - We agree on exactly what we want in terms of atomics and
> > > fences in
> > > > Toronto, so that it can be written up as formal standardese
> > > ideally in
> > > > the post-Toronto mailing.
> > > 
> > > I cannot deny that past discussions on this topic have at 
> times been 
> > > spirited, but I must defer to Michael and Raul on how 
> best to move 
> > > this through the process.
> >
> > Clearly.  And I suspect there will be interesting 
> discussions between 
> > all of us in Toronto.
> >
> > > > > On the "scalar" modifier used here, this means that 
> we are not 
> > > > > proposing atomic access to arbitrary structures, for example, 
> > > > > atomic<struct foo>?
> > > > > (Fine by me, but need to know.)  However, we -are- requiring 
> > > > > that implementations provide atomic access to large scalars,
> > > correct?  So
> > > > > that a conforming 8-bit system would need to provide proper 
> > > > > semantics for "atomic<long long>"?  Not that there are
> > > likely to be
> > > > > many parallel 8-bit systems, to be sure, but same 
> question for 
> > > > > 32-bit systems and both long long and double.  
> (Again, fine by 
> > > > > me either way, but need to know.)
> > > >
> > > > In my view, most of this discussion really belongs in 
> the atomics 
> > > > proposal.  It should specify that if a read to an atomic object 
> > > > observes any of the effect of an atomic write to an 
> object, then 
> > > > it observes all of it.  And that applies to objects beyond
> > > scalars.  But
> > > > it's just an added constraint beyond what is specified in
> > > the memory model section.
> > > > 
> > > > Our atomics proposal still proposes to guarantee atomicity for 
> > > > atomic<T> for any T.  We just don't promise to do it in a 
> > > > lock-free manner.  Thus atomic<struct foo> does behave 
> atomically
> > > with respect to threads.
> > > 
> > > In that case, my question becomes "why is the qualifier 'scalar'
> > > required?".
> >
> > The intent here is to decompose non-scalar reads into the 
> constituent 
> > scalar reads, and similarly for nonscalar assignments, in order to 
> > determine what assignments can be seen where.  Otherwise we 
> would have 
> > to talk about which combination of field assignments a particular 
> > struct read can see.  It just seems easier to define 
> visibility on a 
> > component-by-component basis.
> 
> OK.  So it is then the library's responsibility to make 
> operations on structs appear to be atomic, right?
Correct.

Hans

> 
> 						Thanx, Paul
> 
> > Hans
> > 
> > > 
> > > > > Although the general thrust of 1.10p8 and 1.10p10 
> look OK to me, 
> > > > > more careful analysis will be required -- which won't
> > > happen by your
> > > > > deadline, sorry to say!
> > > >
> > > > Thanks for looking at this quickly.  If we need to make 
> some more 
> > > > tweaks before Toronto, I think that's fine.
> > > > 
> > > > I think at the moment, the real constraint on getting this into 
> > > > the working paper is that we need to have more people 
> reading it.
> > > 
> > > More eyes would indeed be a very good thing from my perspective.
> > > 
> > > 						Thanx, Paul
> > > 
>