[cpp-threads] Slightly revised memory model proposal (D2300)

Fri Jun 15 18:53:37 BST 2007

On Fri, Jun 15, 2007 at 09:09:39AM -0700, Hans Boehm wrote:
> 
> > From paulmck at linux.vnet.ibm.com Tue Jun 12 22:10:15 2007
> >
> > On Tue, Jun 12, 2007 at 09:35:29AM +0100, Nick Maclaren wrote:
> > > "Paul E. McKenney" <paulmck at linux.vnet.ibm.com> wrote:
> > > >
> > > > Eek!!!  In my mind, permitting stores to "flicker" into existence is
> > > > a cure that is far, far worse than any disease I can imagine.  Such
> > > > "flickering" would break quite a bit of code.
> > >
> > > I am not sure that it would break much that isn't already broken,
> > > but I am dead sure that it would make it impossible for even an
> > > expert to work out whether or not such code was broken!
> > >
> > > I agree.  That's curing chilblains by infecting the patient with
> > > malaria.  Please, no!
> > >
> > > > How about instead -slightly- weakening the ordering requirements,
> > > > so that ordered reads can disagree about the order of unordered
> stores
> > > > to different variables?
> > >
> > > Fine.  But let's see a proper, semi-mathematical specification.
> > > I can see that being viable, and I can see it being ambiguous.
> 
> Unfortunately, I think I posted some misinformation here, with respect
> to flickering.  I believe the version of the example that I posted:
> 
> > > Thread 1:
> > > store_relaxed(&x, 1);
> > >
> > > Thread 2:
> > > store_relaxed(&x, 2);
> > >
> > > Thread 3:
> > > r1 = load_acquire(&x); (1)
> > > r2 = load_acquire(&x); (2)
> > > r3 = load_acquire(&x); (1)
> > >
> is already allowed to flicker under the D2300 rules.

1.10p8 says:

	An evaluation A happens before an evaluation B if:

	* A is sequenced before B and either A performs an acquire
          operation or B performs a release operation; or
	* A synchronizes with B; or
	* for some evaluation X, A happens before X and X happens before B. 

The first bullet applies to all pairs of Thread 3's assignments, since
each of those statements performs an acquire operation.

1.10p6 says:

	All modifications to a particular atomic object M occur in some
	particular total order, called the modification order of M. [Note:
	The modification orders reflect the order in which updates to a
	single variable become globally visible. In the case of relaxed
	atomic operations, both assignments to and reads from the object
	may effectively be locally reordered, so the modification order
	may only be very indirectly observable. Repeated load_acquire
	operations on the same atomic object must observe some subset
	of the updates in modification order. —end note ]

Therefore, both thread 1's and thread 2's assignments should happen only
once with respect to any given thread.

So, why doesn't the combination of 1.10p6 and 1.10p8 prohibit store
flickering?  What am I missing here?

>                                                      And looking back
> at Sarita's example, weakening this doesn't seem to help.  (The example
> that we should really have been discussing would have had release stores.
> That's the one that's currently constrained by the modification order
> rule.  And having that flicker does seem dubious.)
> 
> I think the kind of flickering that we already allow is unavoidable if
> we assume that unpaired acquires behave like relaxed operations.

If I read 1.10p8 correctly, unpaired acquires still have some strong
semantics over those of relaxed operations.

>                                                                  Consider
> the above example with thread 3 replaced by:
> 
> p = &x;
> q = &x;
> ...
> while (...) {
>   // Loop does not store to potentially shared variables.
>   r1 = load_relaxed(p);  // loads *p, i.e. x
>   r2 = load_relaxed(q);  // loads *q, i.e. x
> }
> 
> Assume the ellipses code confounds the compiler enough that it no longer
> knows that *p and *q are aliases.  (That's easy to arrange.)  The compiler
> might then move the loads out of the loop, without combining them, which
> could potentially result in exactly this flickering.  (If you don't like
> moving loop invariant load_relaxed operations, think of it as splitting
> the
> loop into two, each one of which contains one of the loads, and then
> combining all the load_relaxed ops in each loop.  Even without combining
> them you have the same problem.)

Even I will agree that we shouldn't require compilers to do exact global
alias analysis.  ;-)

> The D2300 proposal doesn't change this, but it does move to the weaker
> synchronizes-with model, as Paul points out below.
> 
> I think I agree with Paul's general direction for the rest of this, but
> I need to think more about the details.  I think we want to aim for
> 
> - Basically the D2300 model, but
> - some way to strengthen 1.10p7 to explicitly add "synchronizes-with"
> relationships for certain later reads, if the intervening operatons on
> that
> variable were all RMW ops.

If "all RMW ops" becomes "all non-relaxed RMW ops", then I believe that
we are in agreement.

> Based on attempts to tweak this in the past, I think that sort of
> formulation
> is most likely to work.
> 
> This would lead us to someplace that's clearly between D2300 and N2171.

Sounds good to me!

						Thanx, Paul

> Hans
> >
> > The only thing I see missing from D2300 is the behavior WRT atomic
> > operations.  Here is some possibilities:
> >
> > 1.	No ordering, even between successive operations to the same
> > 	variable.
> >
> > 2.	"Precedes" ordering between the store portion of one operation
> > 	and the load portion of the succeeding operation to a given
> > 	variable, but not between the load and store portion of any
> > 	single operation.  No ordering between operations on different
> > 	variables.
> >
> > 3.	"Precedes" ordering between a pair of operations to the same
> > 	variable, and also with a "precedes" relationship between
> > 	the load and store portions of a given operation.  No ordering
> > 	between operations on different variables.
> >
> > 4.	Synchronizes-with ordering between a _acquire() operation and
> > 	a following _release() operation to the same variable, possibly
> > 	combined with one of the other options.
> >
> > 5.	Happens-before ordering between any pair of operations to the
> > 	same variable.  No ordering between operations on different
> > 	variables.
> >
> > 6.	Happens-before ordering between any pair of operations, even
> > 	if to different variables (sequential consistency).
> >
> > I believe that fetch_add_relaxed() belongs at #2 above.
> >
> > 	#1 does not make sense, as even a simple load satisfied by a given
> > 		simple store implies a "precedes" relationship.
> >
> > 	#3 prohibits some weak implementations involving shared
> > 		store buffers or caches where a given atomic
> > 		operation is applied to a store-buffer entry.
> > 		Stronger options are even more restrictive to
> > 		underlying implementations.
> >
> > For fetch_add_acquire() and fetch_add_release(), #3 combined with #4.
> >
> > 	#2 is pointless -- there needs to be a memory barrier anyway,
> > 		so it might as well enforce the ordering of the
> > 		atomic operation's load and store while it is at it.
> > 		(Anyone know of a machine where this causes trouble?)
> >
> > 	#5 might be OK, but not clear that it helps the programmer.
> >
> > For fetch_add_seq_cst(), #6.
> >
> > 	Goes without saying, one hopes.
> >
> > Combinations of different modes would use the weaker of the pair.
> > Note that a fetch_add_seq_cst() would synchronize with both a prior
> > fetch_add_release() and a subsequent fetch_add_acquire() to the same
> > variable.
> >
> > Other atomic operations should have the same semantics as fetch_add_*().
> >
> > Seem reasonable?
> >
> > 						Thanx, Paul
> >
>