[cpp-threads] Slightly revised memory model proposal (D2300)

Sun Jun 24 18:38:41 BST 2007

On Sun, Jun 24, 2007 at 02:54:53AM -0000, Boehm, Hans wrote:
> > -----Original Message-----
> > From: Paul E. McKenney [mailto:paulmck at linux.vnet.ibm.com] 
> > Sent: Friday, June 22, 2007 3:34 PM
> > To: Boehm, Hans
> > Cc: C++ threads standardisation; Sarita V Adve
> > Subject: Re: [cpp-threads] Slightly revised memory model 
> > proposal (D2300)
> > 
> > On Fri, Jun 22, 2007 at 08:38:51PM -0000, Boehm, Hans wrote:
> > > > -----Original Message-----
> > > > From:  Paul E. McKenney
> > 
> > [ . . . ]
> > 
> > > > >     An evaluation A that performs a release operation 
> > on an object
> > > > >     M synchronizes with an evaluation B that performs an acquire
> > > > >     operation on M and reads either the value written 
> > by A or, if
> > > > >     the following (in modification order) sequence of 
> > updates to M
> > > > >     are atomic read-modify-write operations or ordered
> > > > atomic stores,
> > > > 
> > > > Again, I believe that this should instead read "non-relaxed 
> > > > read-modify-write operations" in order to allow some common 
> > > > data-element initialization optimizations.
> > >
> > > Could you explain in a bit more detail?  It seems weird to 
> > me that an 
> > > intervening fetch_add_relaxed(0) would break "synchronizes with"
> > > relationships, but fetch_add_acquire(0) would not.  Acq_rel and 
> > > seq_cst inherently do not have this issue, but the others 
> > all seem to 
> > > me like they should be treated identically.
> > 
> > The proper analogy is between fetch_add_acquire() and 
> > fetch_add_relaxed on the on hand and load_acquire() and 
> > load_relaxed() on the other.
> > In both cases, the _acquire() variant participates in 
> > "synchronizes with"
> > relationships, while the _relaxed() variant does not.
> > 
> > Looking at the argument to fetch_add_relaxed(), we have
> > fetch_add_relaxed(0) breaking the "synchronizes with" 
> > relationship, but then again, so would fetch_add_relaxed(1), 
> > fetch_add_relaxed(2), and even fetch_add_relaxed(42).
> > 
> > Or am I missing something subtle here?
>
> I think we're still misunderstanding each other.  The situation this is
> addressing is roughly
> 
> T1			T2			T3
> x.store_release(17);
> 			x.fetch_add_relaxed(1);
> 						x.load_acquire();
> 
> which we can think of as executing sequentially in that order.  Thus the
> load_acquire sees a value of 18.  The question is whether the
> store_release synchronizes with the load_acquire.  We agree that there
> are no synchronizes with relationships involving the fetch_add (which
> would change if the fetch_add were anything other than relaxed).
> 
> Was this also the scenario you were looking at?

I was considering scenarios of this sort, but where the compiler realized
that it safely could use weaker fences for the x.store_release(),
but only in absence of an intervening synchronizes-with-preserving
x.fetch_add_relaxed().

> If not, do you want to suggest clearer wording?

Just adding the "non-relaxed" modifier for the RMW operations would
be fine.

> > > ...would like to aim for is:
> > > 
> > > - We vote the basic memory model text (N2300), possibly with some 
> > > small changes, into the working paper in Toronto.  I suspect that's 
> > > nearly required to have a chance at propagating the effects 
> > into some 
> > > of the library text by Kona.
> > > 
> > > - We agree on exactly what we want in terms of atomics and 
> > fences in 
> > > Toronto, so that it can be written up as formal standardese 
> > ideally in 
> > > the post-Toronto mailing.
> > 
> > I cannot deny that past discussions on this topic have at 
> > times been spirited, but I must defer to Michael and Raul on 
> > how best to move this through the process.
>
> Clearly.  And I suspect there will be interesting discussions between
> all of us in Toronto.
>
> > > > On the "scalar" modifier used here, this means that we are not 
> > > > proposing atomic access to arbitrary structures, for example, 
> > > > atomic<struct foo>?
> > > > (Fine by me, but need to know.)  However, we -are- requiring that 
> > > > implementations provide atomic access to large scalars, 
> > correct?  So 
> > > > that a conforming 8-bit system would need to provide proper 
> > > > semantics for "atomic<long long>"?  Not that there are 
> > likely to be 
> > > > many parallel 8-bit systems, to be sure, but same question for 
> > > > 32-bit systems and both long long and double.  (Again, fine by me 
> > > > either way, but need to know.)
> > >
> > > In my view, most of this discussion really belongs in the atomics 
> > > proposal.  It should specify that if a read to an atomic object 
> > > observes any of the effect of an atomic write to an object, then it 
> > > observes all of it.  And that applies to objects beyond 
> > scalars.  But 
> > > it's just an added constraint beyond what is specified in 
> > the memory model section.
> > > 
> > > Our atomics proposal still proposes to guarantee atomicity for 
> > > atomic<T> for any T.  We just don't promise to do it in a lock-free 
> > > manner.  Thus atomic<struct foo> does behave atomically 
> > with respect to threads.
> > 
> > In that case, my question becomes "why is the qualifier 'scalar'
> > required?".
>
> The intent here is to decompose non-scalar reads into the constituent
> scalar reads, and similarly for nonscalar assignments, in order to
> determine what assignments can be seen where.  Otherwise we would have
> to talk about which combination of field assignments a particular struct
> read can see.  It just seems easier to define visibility on a
> component-by-component basis.

OK.  So it is then the library's responsibility to make operations on
structs appear to be atomic, right?

						Thanx, Paul

> Hans
> 
> > 
> > > > Although the general thrust of 1.10p8 and 1.10p10 look OK to me, 
> > > > more careful analysis will be required -- which won't 
> > happen by your 
> > > > deadline, sorry to say!
> > >
> > > Thanks for looking at this quickly.  If we need to make some more 
> > > tweaks before Toronto, I think that's fine.
> > > 
> > > I think at the moment, the real constraint on getting this into the 
> > > working paper is that we need to have more people reading it.
> > 
> > More eyes would indeed be a very good thing from my perspective.
> > 
> > 						Thanx, Paul
> >