[cpp-threads] Slightly revised memory model proposal (D2300)

Wed Jun 27 17:46:16 BST 2007

On Tue, Jun 26, 2007 at 08:44:48PM -0000, Boehm, Hans wrote:
> 
> > -----Original Message-----
> > From: Paul E. McKenney [mailto:paulmck at linux.vnet.ibm.com] 
> > Sent: Sunday, June 24, 2007 10:39 AM
> > To: Boehm, Hans
> > Cc: C++ threads standardisation; Sarita V Adve
> > Subject: Re: [cpp-threads] Slightly revised memory model 
> > proposal (D2300)
> > 
> > On Sun, Jun 24, 2007 at 02:54:53AM -0000, Boehm, Hans wrote:
> > > > -----Original Message-----
> > > > From: Paul E. McKenney [mailto:paulmck at linux.vnet.ibm.com]
> > > > Sent: Friday, June 22, 2007 3:34 PM
> > > > To: Boehm, Hans
> > > > Cc: C++ threads standardisation; Sarita V Adve
> > > > Subject: Re: [cpp-threads] Slightly revised memory model proposal 
> > > > (D2300)
> > > > 
> > > > On Fri, Jun 22, 2007 at 08:38:51PM -0000, Boehm, Hans wrote:
> > > > > > -----Original Message-----
> > > > > > From:  Paul E. McKenney
> > > > 
> > > > [ . . . ]
> > > > 
> > > > > > >     An evaluation A that performs a release operation
> > > > on an object
> > > > > > >     M synchronizes with an evaluation B that 
> > performs an acquire
> > > > > > >     operation on M and reads either the value written
> > > > by A or, if
> > > > > > >     the following (in modification order) sequence of
> > > > updates to M
> > > > > > >     are atomic read-modify-write operations or ordered
> > > > > > atomic stores,
> > > > > > 
> > > > > > Again, I believe that this should instead read "non-relaxed 
> > > > > > read-modify-write operations" in order to allow some common 
> > > > > > data-element initialization optimizations.
> > > > >
> > > > > Could you explain in a bit more detail?  It seems weird to
> > > > me that an
> > > > > intervening fetch_add_relaxed(0) would break "synchronizes with"
> > > > > relationships, but fetch_add_acquire(0) would not.  Acq_rel and 
> > > > > seq_cst inherently do not have this issue, but the others
> > > > all seem to
> > > > > me like they should be treated identically.
> > > > 
> > > > The proper analogy is between fetch_add_acquire() and 
> > > > fetch_add_relaxed on the on hand and load_acquire() and
> > > > load_relaxed() on the other.
> > > > In both cases, the _acquire() variant participates in 
> > "synchronizes 
> > > > with"
> > > > relationships, while the _relaxed() variant does not.
> > > > 
> > > > Looking at the argument to fetch_add_relaxed(), we have
> > > > fetch_add_relaxed(0) breaking the "synchronizes with" 
> > > > relationship, but then again, so would fetch_add_relaxed(1), 
> > > > fetch_add_relaxed(2), and even fetch_add_relaxed(42).
> > > > 
> > > > Or am I missing something subtle here?
> > >
> > > I think we're still misunderstanding each other.  The 
> > situation this 
> > > is addressing is roughly
> > > 
> > > T1			T2			T3
> > > x.store_release(17);
> > > 			x.fetch_add_relaxed(1);
> > > 						x.load_acquire();
> > > 
> > > which we can think of as executing sequentially in that 
> > order.  Thus 
> > > the load_acquire sees a value of 18.  The question is whether the 
> > > store_release synchronizes with the load_acquire.  We agree 
> > that there 
> > > are no synchronizes with relationships involving the 
> > fetch_add (which 
> > > would change if the fetch_add were anything other than relaxed).
> > > 
> > > Was this also the scenario you were looking at?
> > 
> > I was considering scenarios of this sort, but where the 
> > compiler realized that it safely could use weaker fences for 
> > the x.store_release(), but only in absence of an intervening 
> > synchronizes-with-preserving x.fetch_add_relaxed().
> Could you be more specific?  In my view, this does result in rather
> weird semantics, so I would rather not go there without good motivation.
> 
> Consider:
> 
> int foo;
> atomic<unsigned> n_foo_accesses(0);
> 
> T1:
> foo = 17;
> n_foo_accesses.store_release(1);
> 
> T2-N:
> if (n_foo_accesses.load_acquire() > 0) {
>   r1 = foo;
>   n_foo_accesses.fetch_add_relaxed(1);
> }
> 
> n_foo_accesses is examined again only after all threads are joined.
> It's unclear why this should not work this way, but should work if I
> said fetch_add_acquire instead.  The difference intuitively seems
> irrelevant to me.

The pthread_join() primitive needs to be defined to include whatever
fences are required to permit subsequent code to reliably see the
newly terminated thread's accesses.  So in this case, it should not
be necessary to used non-relaxed fetch_add.  Instead, the fact that
the thread has exited and the corresponding pthread_join() has completed
means that the dead thread's accesses have completed and are visible
to the thread that invoked pthread_join().

						Thanx, Paul

> > > If not, do you want to suggest clearer wording?
> > 
> > Just adding the "non-relaxed" modifier for the RMW operations 
> > would be fine.
> > 
> > > > > ...would like to aim for is:
> > > > > 
> > > > > - We vote the basic memory model text (N2300), possibly 
> > with some 
> > > > > small changes, into the working paper in Toronto.  I suspect 
> > > > > that's nearly required to have a chance at propagating 
> > the effects
> > > > into some
> > > > > of the library text by Kona.
> > > > > 
> > > > > - We agree on exactly what we want in terms of atomics and
> > > > fences in
> > > > > Toronto, so that it can be written up as formal standardese
> > > > ideally in
> > > > > the post-Toronto mailing.
> > > > 
> > > > I cannot deny that past discussions on this topic have at 
> > times been 
> > > > spirited, but I must defer to Michael and Raul on how 
> > best to move 
> > > > this through the process.
> > >
> > > Clearly.  And I suspect there will be interesting 
> > discussions between 
> > > all of us in Toronto.
> > >
> > > > > > On the "scalar" modifier used here, this means that 
> > we are not 
> > > > > > proposing atomic access to arbitrary structures, for example, 
> > > > > > atomic<struct foo>?
> > > > > > (Fine by me, but need to know.)  However, we -are- requiring 
> > > > > > that implementations provide atomic access to large scalars,
> > > > correct?  So
> > > > > > that a conforming 8-bit system would need to provide proper 
> > > > > > semantics for "atomic<long long>"?  Not that there are
> > > > likely to be
> > > > > > many parallel 8-bit systems, to be sure, but same 
> > question for 
> > > > > > 32-bit systems and both long long and double.  
> > (Again, fine by 
> > > > > > me either way, but need to know.)
> > > > >
> > > > > In my view, most of this discussion really belongs in 
> > the atomics 
> > > > > proposal.  It should specify that if a read to an atomic object 
> > > > > observes any of the effect of an atomic write to an 
> > object, then 
> > > > > it observes all of it.  And that applies to objects beyond
> > > > scalars.  But
> > > > > it's just an added constraint beyond what is specified in
> > > > the memory model section.
> > > > > 
> > > > > Our atomics proposal still proposes to guarantee atomicity for 
> > > > > atomic<T> for any T.  We just don't promise to do it in a 
> > > > > lock-free manner.  Thus atomic<struct foo> does behave 
> > atomically
> > > > with respect to threads.
> > > > 
> > > > In that case, my question becomes "why is the qualifier 'scalar'
> > > > required?".
> > >
> > > The intent here is to decompose non-scalar reads into the 
> > constituent 
> > > scalar reads, and similarly for nonscalar assignments, in order to 
> > > determine what assignments can be seen where.  Otherwise we 
> > would have 
> > > to talk about which combination of field assignments a particular 
> > > struct read can see.  It just seems easier to define 
> > visibility on a 
> > > component-by-component basis.
> > 
> > OK.  So it is then the library's responsibility to make 
> > operations on structs appear to be atomic, right?
> Correct.
> 
> Hans
> 
> > 
> > 						Thanx, Paul
> > 
> > > Hans
> > > 
> > > > 
> > > > > > Although the general thrust of 1.10p8 and 1.10p10 
> > look OK to me, 
> > > > > > more careful analysis will be required -- which won't
> > > > happen by your
> > > > > > deadline, sorry to say!
> > > > >
> > > > > Thanks for looking at this quickly.  If we need to make 
> > some more 
> > > > > tweaks before Toronto, I think that's fine.
> > > > > 
> > > > > I think at the moment, the real constraint on getting this into 
> > > > > the working paper is that we need to have more people 
> > reading it.
> > > > 
> > > > More eyes would indeed be a very good thing from my perspective.
> > > > 
> > > > 						Thanx, Paul
> > > > 
> >