[cpp-threads] Slightly revised memory model proposal (D2300)

Wed Jun 20 02:50:11 BST 2007

On Tue, Jun 19, 2007 at 11:30:41PM -0000, Boehm, Hans wrote:
>  
> 
> > -----Original Message-----
> > From: Paul E. McKenney [mailto:paulmck at linux.vnet.ibm.com] 
> > Sent: Friday, June 15, 2007 10:54 AM
> > To: Boehm, Hans
> > Cc: C++ threads standardisation; Nick Maclaren
> > Subject: Re: [cpp-threads] Slightly revised memory model 
> > proposal (D2300)
> > 
> > On Fri, Jun 15, 2007 at 09:09:39AM -0700, Hans Boehm wrote:
> > > 
> > > to flickering.  I believe the version of the example that I posted:
> > > 
> > > > > Thread 1:
> > > > > store_relaxed(&x, 1);
> > > > >
> > > > > Thread 2:
> > > > > store_relaxed(&x, 2);
> > > > >
> > > > > Thread 3:
> > > > > r1 = load_acquire(&x); (1)
> > > > > r2 = load_acquire(&x); (2)
> > > > > r3 = load_acquire(&x); (1)
> > > > >
> > > is already allowed to flicker under the D2300 rules.
> > 
> > 1.10p8 says:
> > 
> > 	An evaluation A happens before an evaluation B if:
> > 
> > 	* A is sequenced before B and either A performs an acquire
> >           operation or B performs a release operation; or
> > 	* A synchronizes with B; or
> > 	* for some evaluation X, A happens before X and X 
> > happens before B. 
> > 
> > The first bullet applies to all pairs of Thread 3's 
> > assignments, since each of those statements performs an 
> > acquire operation.
> > 
> > 1.10p6 says:
> > 
> > 	All modifications to a particular atomic object M occur in some
> > 	particular total order, called the modification order 
> > of M. [Note:
> > 	The modification orders reflect the order in which updates to a
> > 	single variable become globally visible. In the case of relaxed
> > 	atomic operations, both assignments to and reads from the object
> > 	may effectively be locally reordered, so the modification order
> > 	may only be very indirectly observable. Repeated load_acquire
> > 	operations on the same atomic object must observe some subset
> > 	of the updates in modification order. -end note ]
> > 
> > Therefore, both thread 1's and thread 2's assignments should 
> > happen only once with respect to any given thread.
> > 
> > So, why doesn't the combination of 1.10p6 and 1.10p8 prohibit 
> > store flickering?  What am I missing here?
> 
> The short answer is that the note is in error.  I will fix it.  The note
> does not follow from the binding parts of the proposal, as it should.
> Modification order affects visibility only in that it affects
> synchronizes-with (which we are about to remove) and as described in
> 1.10p10.  Since the stores don't happen before anything, 1.10p10 does
> not apply.

Except that I was not referring to the note, but rather to the
first sentence:

	All modifications to a particular atomic object M occur in some
	particular total order, called the modification order of M.

If there is a particular total order on the stores to a given atomic
object, then ordered reads from a given thread should not observe
any flickering.  (Unordered reads could of course see flickering due
to the fact that they might be reordered.)

> > >                                                      And 
> > looking back 
> > > at Sarita's example, weakening this doesn't seem to help.  (The 
> > > example that we should really have been discussing would 
> > have had release stores.
> > > That's the one that's currently constrained by the 
> > modification order 
> > > rule.  And having that flicker does seem dubious.)
> > > 
> > > I think the kind of flickering that we already allow is 
> > unavoidable if 
> > > we assume that unpaired acquires behave like relaxed operations.
> > 
> > If I read 1.10p8 correctly, unpaired acquires still have some 
> > strong semantics over those of relaxed operations.
> 
> They currently do have some, as a result of the precedes definition.  As
> I said in my last message, I think those need to disappear, too.
> Otherwise we again end up with low-level atomics interfering with lock
> elimination, which I don't think is acceptable.

Which lock-elimination scenarios are you concerned with here?  Presumably
these scenarios limit optimizations in programs that have no low-level
atomics?

Besides, doesn't teachability figure in here somewhere?  If we are
no longer concerned at all about teachability, then we need to revisit
ordered operations.

> > > The D2300 proposal doesn't change this, but it does move to 
> > the weaker 
> > > synchronizes-with model, as Paul points out below.
> > > 
> > > I think I agree with Paul's general direction for the rest of this, 
> > > but I need to think more about the details.  I think we want to aim 
> > > for
> > > 
> > > - Basically the D2300 model, but
> > > - some way to strengthen 1.10p7 to explicitly add 
> > "synchronizes-with"
> > > relationships for certain later reads, if the intervening 
> > operatons on 
> > > that variable were all RMW ops.
> > 
> > If "all RMW ops" becomes "all non-relaxed RMW ops", then I 
> > believe that we are in agreement.
> 
> Why not relaxed operations?

Because it can require useless memory barriers for relaxed atomic
operations that are in critical sections of locks or in initialization
sequences.  If one needs the RMW ops to participate in synchronizes-with
relationships, one should use _acquire, _release, or seq_cst rather than
_relaxed.

>                             If a fetch_add_relaxed(..., 0) is performed
> between the store and the load,which should that affect
> synchronizes-with?

The above might mean a number of things.  If the following:

T1: store_relaxed(x,1); r1=fetch_add_relaxed(y,0); r2=load_relaxed(z);

then there is no synchronizes-with to begin with, so presumably you
meant something else.

If instead:

T1: x=1;store_release(y,1);
T2: r1=fetch_add_relaxed(y,0);
T3: if (load_acquire(y) == 1) assert(x==1);

Then, yes, the fetch_add_relaxed() would potentially break the
synchronizes-with relationship, so that the assertion could fail.
Then again, the same could happen in the following example:

T1: x=1;store_release(y,1);
T2: r1=fetch_add_relaxed(y,1);
T3: if (load_acquire(y) == 2) assert(x==1);

So things are consistent.  The synchronizes-with relation applies to an
unbroken chain of release-acquire operations as would be represented by
a chain of lock acquisitions and releases on a given lock variable.
A _seq_cst operation could of course be substituted for an acquire or
a release operation.

Or did you have some other example in mind?

> Another revision of D2300 is coming up ...

Looking forward to seeing it!  And hoping that it does not have
flickering stores or unduly constrained _relaxed operations...

							Thanx, Paul