[cpp-threads] memory model

Wed May 4 23:44:20 BST 2005

> From: Peter Dimov
> Sent: Saturday, April 30, 2005 10:36 AM

> Code conditional acquire prevents subsequent operations that 
> can be executed 
> conditionally depending on the value v being loaded to become 
> visible before 
> the current load. It does not inhibit reordering of 
> independent operations.
> 
> Definition of "can be executed conditionally": there exist 
> two values v1 and 
> v2 such that for v == v1 the operation op is executed and for 
> v == v2 the 
> operation op is not executed.
I like this better than definitions based on some static notion of
dependency, but it still seems very problematic to me.  I think
it's well-defined at the source level.  But I don't think it's
robust against otherwise legal and common code transformations.
In particular:

	if (load_cc_acquire(x)) {
		y = 1;
		z = a[1];
	} else {
		y = 0;
		z = a[0];
	}

is NOT equivalent to

      if (load_cc_acquire(x)) {
		y = 1;
	} else {
		y = 0;
	}
	z = a[y];

In fact, it appears to me that

      if (r1 = load_cc_acquire(x)) {
		z = a[0];
	} else {
		z = a[0];
	}

is not equivalent to

	r1 = load_cc_acquire(x);
	z = a[0]

I expect this would be a major pain for compiler implementors,
though it would be good to hear from them.  I also expect it's
a maintainance problem in that "obviously correct" code
simplifications are unexpectedly wrong.

To make matters worse, now if I have

int f(int x)
{
	return (x? a[0] : a[0]);
}

I can't optimize that to

int f(int x)
{
	return a[0];
}

because x might have been generated by a load_cc_acquire,
and the original version would be ordered with respect to that
load, and the transformed version wouldn't be.
> 
> Data dependency is similar: for every two different values v1 
> and v2, when v 
> == v1, the operation op accesses memory location a1, and when 
> v == v2, op 
> accesses a2, where a1 != a2.
That's a different notion from what I had in mind, but it may
be basically OK.  It seems more robust than the cc variant.

Consider:

	r1 = load_dd_acquire(x);
	r2 = (r1? 0 : 1);
	r3 = a[r2];

then the last load is data-dependent on the initial one by
this definition.  As far as I can tell, this is OK on Itanium,
and maybe on Power (where load_dd_acquire is presumably just an
ordinary load).  But it would be nice to get confirmation of
that from the hardware architects.

If all processors we are likely to care about in 2009 enforce
the necessary data dependent ordering, as it appears, should
the be the default for an atomic load?  (I'm assuming we
can define what that means cleanly, which is not at all clear
to me, yet.)
> 
> This definition makes x = a[ r1? 0: 1 ] code conditional 
> instead of data 
> dependent.
Now you lost me.  Isn't the load executed unconditionally?
At least if we assume a straightforward compiler?

> 
> What this means in practice when these labels are used:
> 
> - a compiler can only reorder if it can prove that the 
> operation is not code 
> conditional (or data dependent.)
> 
> - when the hardware is known to track code and data 
> dependencies and inhibit 
> reordering based on them, the compiler can take advantage of 
> these implicit 
> barriers and avoid explicit synchronization. 
> 
I understand the motivation.  But I'm not convinced it's worth
the cost, at least for the control dependent version.  Certainly
if it affects optimization of code that doesn't obviously use
these primitives, it isn't.

Hans