[cpp-threads] Slightly revised memory model proposal (D2300)

Sun Jun 24 03:54:53 BST 2007

> -----Original Message-----
> From: Paul E. McKenney [mailto:paulmck at linux.vnet.ibm.com] 
> Sent: Friday, June 22, 2007 3:34 PM
> To: Boehm, Hans
> Cc: C++ threads standardisation; Sarita V Adve
> Subject: Re: [cpp-threads] Slightly revised memory model 
> proposal (D2300)
> 
> On Fri, Jun 22, 2007 at 08:38:51PM -0000, Boehm, Hans wrote:
> > > -----Original Message-----
> > > From:  Paul E. McKenney
> 
> [ . . . ]
> 
> > > >     An evaluation A that performs a release operation 
> on an object
> > > >     M synchronizes with an evaluation B that performs an acquire
> > > >     operation on M and reads either the value written 
> by A or, if
> > > >     the following (in modification order) sequence of 
> updates to M
> > > >     are atomic read-modify-write operations or ordered
> > > atomic stores,
> > > 
> > > Again, I believe that this should instead read "non-relaxed 
> > > read-modify-write operations" in order to allow some common 
> > > data-element initialization optimizations.
> >
> > Could you explain in a bit more detail?  It seems weird to 
> me that an 
> > intervening fetch_add_relaxed(0) would break "synchronizes with"
> > relationships, but fetch_add_acquire(0) would not.  Acq_rel and 
> > seq_cst inherently do not have this issue, but the others 
> all seem to 
> > me like they should be treated identically.
> 
> The proper analogy is between fetch_add_acquire() and 
> fetch_add_relaxed on the on hand and load_acquire() and 
> load_relaxed() on the other.
> In both cases, the _acquire() variant participates in 
> "synchronizes with"
> relationships, while the _relaxed() variant does not.
> 
> Looking at the argument to fetch_add_relaxed(), we have
> fetch_add_relaxed(0) breaking the "synchronizes with" 
> relationship, but then again, so would fetch_add_relaxed(1), 
> fetch_add_relaxed(2), and even fetch_add_relaxed(42).
> 
> Or am I missing something subtle here?
I think we're still misunderstanding each other.  The situation this is
addressing is roughly

T1			T2			T3
x.store_release(17);
			x.fetch_add_relaxed(1);
						x.load_acquire();

which we can think of as executing sequentially in that order.  Thus the
load_acquire sees a value of 18.  The question is whether the
store_release synchronizes with the load_acquire.  We agree that there
are no synchronizes with relationships involving the fetch_add (which
would change if the fetch_add were anything other than relaxed).

Was this also the scenario you were looking at?

If not, do you want to suggest clearer wording?

> > ...would like to aim for is:
> > 
> > - We vote the basic memory model text (N2300), possibly with some 
> > small changes, into the working paper in Toronto.  I suspect that's 
> > nearly required to have a chance at propagating the effects 
> into some 
> > of the library text by Kona.
> > 
> > - We agree on exactly what we want in terms of atomics and 
> fences in 
> > Toronto, so that it can be written up as formal standardese 
> ideally in 
> > the post-Toronto mailing.
> 
> I cannot deny that past discussions on this topic have at 
> times been spirited, but I must defer to Michael and Raul on 
> how best to move this through the process.
Clearly.  And I suspect there will be interesting discussions between
all of us in Toronto.
> > > On the "scalar" modifier used here, this means that we are not 
> > > proposing atomic access to arbitrary structures, for example, 
> > > atomic<struct foo>?
> > > (Fine by me, but need to know.)  However, we -are- requiring that 
> > > implementations provide atomic access to large scalars, 
> correct?  So 
> > > that a conforming 8-bit system would need to provide proper 
> > > semantics for "atomic<long long>"?  Not that there are 
> likely to be 
> > > many parallel 8-bit systems, to be sure, but same question for 
> > > 32-bit systems and both long long and double.  (Again, fine by me 
> > > either way, but need to know.)
> >
> > In my view, most of this discussion really belongs in the atomics 
> > proposal.  It should specify that if a read to an atomic object 
> > observes any of the effect of an atomic write to an object, then it 
> > observes all of it.  And that applies to objects beyond 
> scalars.  But 
> > it's just an added constraint beyond what is specified in 
> the memory model section.
> > 
> > Our atomics proposal still proposes to guarantee atomicity for 
> > atomic<T> for any T.  We just don't promise to do it in a lock-free 
> > manner.  Thus atomic<struct foo> does behave atomically 
> with respect to threads.
> 
> In that case, my question becomes "why is the qualifier 'scalar'
> required?".
The intent here is to decompose non-scalar reads into the constituent
scalar reads, and similarly for nonscalar assignments, in order to
determine what assignments can be seen where.  Otherwise we would have
to talk about which combination of field assignments a particular struct
read can see.  It just seems easier to define visibility on a
component-by-component basis.

Hans

> 
> > > Although the general thrust of 1.10p8 and 1.10p10 look OK to me, 
> > > more careful analysis will be required -- which won't 
> happen by your 
> > > deadline, sorry to say!
> >
> > Thanks for looking at this quickly.  If we need to make some more 
> > tweaks before Toronto, I think that's fine.
> > 
> > I think at the moment, the real constraint on getting this into the 
> > working paper is that we need to have more people reading it.
> 
> More eyes would indeed be a very good thing from my perspective.
> 
> 						Thanx, Paul
>