[cpp-threads] Slightly revised memory model proposal (D2300)

Fri Jun 15 17:09:39 BST 2007

> From paulmck at linux.vnet.ibm.com Tue Jun 12 22:10:15 2007
>
> On Tue, Jun 12, 2007 at 09:35:29AM +0100, Nick Maclaren wrote:
> > "Paul E. McKenney" <paulmck at linux.vnet.ibm.com> wrote:
> > >
> > > Eek!!!  In my mind, permitting stores to "flicker" into existence is
> > > a cure that is far, far worse than any disease I can imagine.  Such
> > > "flickering" would break quite a bit of code.
> >
> > I am not sure that it would break much that isn't already broken,
> > but I am dead sure that it would make it impossible for even an
> > expert to work out whether or not such code was broken!
> >
> > I agree.  That's curing chilblains by infecting the patient with
> > malaria.  Please, no!
> >
> > > How about instead -slightly- weakening the ordering requirements,
> > > so that ordered reads can disagree about the order of unordered
stores
> > > to different variables?
> >
> > Fine.  But let's see a proper, semi-mathematical specification.
> > I can see that being viable, and I can see it being ambiguous.

Unfortunately, I think I posted some misinformation here, with respect
to flickering.  I believe the version of the example that I posted:

> > Thread 1:
> > store_relaxed(&x, 1);
> >
> > Thread 2:
> > store_relaxed(&x, 2);
> >
> > Thread 3:
> > r1 = load_acquire(&x); (1)
> > r2 = load_acquire(&x); (2)
> > r3 = load_acquire(&x); (1)
> >
is already allowed to flicker under the D2300 rules.  And looking back
at Sarita's example, weakening this doesn't seem to help.  (The example
that we should really have been discussing would have had release stores.
That's the one that's currently constrained by the modification order
rule.  And having that flicker does seem dubious.)

I think the kind of flickering that we already allow is unavoidable if
we assume that unpaired acquires behave like relaxed operations.  Consider
the above example with thread 3 replaced by:

p = &x;
q = &x;
...
while (...) {
  // Loop does not store to potentially shared variables.
  r1 = load_relaxed(p);  // loads *p, i.e. x
  r2 = load_relaxed(q);  // loads *q, i.e. x
}

Assume the ellipses code confounds the compiler enough that it no longer
knows that *p and *q are aliases.  (That's easy to arrange.)  The compiler
might then move the loads out of the loop, without combining them, which
could potentially result in exactly this flickering.  (If you don't like
moving loop invariant load_relaxed operations, think of it as splitting
the
loop into two, each one of which contains one of the loads, and then
combining all the load_relaxed ops in each loop.  Even without combining
them you have the same problem.)

The D2300 proposal doesn't change this, but it does move to the weaker
synchronizes-with model, as Paul points out below.

I think I agree with Paul's general direction for the rest of this, but
I need to think more about the details.  I think we want to aim for

- Basically the D2300 model, but
- some way to strengthen 1.10p7 to explicitly add "synchronizes-with"
relationships for certain later reads, if the intervening operatons on
that
variable were all RMW ops.

Based on attempts to tweak this in the past, I think that sort of
formulation
is most likely to work.

This would lead us to someplace that's clearly between D2300 and N2171.

Hans
>
> The only thing I see missing from D2300 is the behavior WRT atomic
> operations.  Here is some possibilities:
>
> 1.	No ordering, even between successive operations to the same
> 	variable.
>
> 2.	"Precedes" ordering between the store portion of one operation
> 	and the load portion of the succeeding operation to a given
> 	variable, but not between the load and store portion of any
> 	single operation.  No ordering between operations on different
> 	variables.
>
> 3.	"Precedes" ordering between a pair of operations to the same
> 	variable, and also with a "precedes" relationship between
> 	the load and store portions of a given operation.  No ordering
> 	between operations on different variables.
>
> 4.	Synchronizes-with ordering between a _acquire() operation and
> 	a following _release() operation to the same variable, possibly
> 	combined with one of the other options.
>
> 5.	Happens-before ordering between any pair of operations to the
> 	same variable.  No ordering between operations on different
> 	variables.
>
> 6.	Happens-before ordering between any pair of operations, even
> 	if to different variables (sequential consistency).
>
> I believe that fetch_add_relaxed() belongs at #2 above.
>
> 	#1 does not make sense, as even a simple load satisfied by a given
> 		simple store implies a "precedes" relationship.
>
> 	#3 prohibits some weak implementations involving shared
> 		store buffers or caches where a given atomic
> 		operation is applied to a store-buffer entry.
> 		Stronger options are even more restrictive to
> 		underlying implementations.
>
> For fetch_add_acquire() and fetch_add_release(), #3 combined with #4.
>
> 	#2 is pointless -- there needs to be a memory barrier anyway,
> 		so it might as well enforce the ordering of the
> 		atomic operation's load and store while it is at it.
> 		(Anyone know of a machine where this causes trouble?)
>
> 	#5 might be OK, but not clear that it helps the programmer.
>
> For fetch_add_seq_cst(), #6.
>
> 	Goes without saying, one hopes.
>
> Combinations of different modes would use the weaker of the pair.
> Note that a fetch_add_seq_cst() would synchronize with both a prior
> fetch_add_release() and a subsequent fetch_add_acquire() to the same
> variable.
>
> Other atomic operations should have the same semantics as fetch_add_*().
>
> Seem reasonable?
>
> 						Thanx, Paul
>