[cpp-threads] Memory model formalism, again

Wed Dec 14 23:20:09 GMT 2005

[Bill - this is sort of a long delayed continuation of our earlier
discussion, so I'd appreciate it if you had a chance to look at this.]

I think we so far have at least two problems with the current posted
strawman memory model proposal:

1. We don't know how to reasonably deal with the causality requirement
for atomic operations.  Somehow, we need to make it clear that

(v, w are "atomic" variables and initially zero, assignment means
unordered loads and stores)

Thread 1:	v = w; if (v == 42) z = 1;
Thread 2:	w = v; z = 2;

does not have a data race.

2. I'm uncomfortable with the current definition of a data race in which
is defined in terms of executions that are happens-before ordered with
respect to atomics, but sequentially consistent with respect to ordinary
loads and stores.  It seems a bit baroque and opaque.

Here's a half-baked alternative that was suggested by a conversation
Bill and I had in early October.  (The (nonworking) proposal we came up
with then has been sufficiently mangled that I'm not at all sure he will
still like this one, even if it "works".)

This is a rough outline of how to address both of the above issues.
Please ask if things are unclear, and complain if they are wrong (a
distinct possibility):

1) (I think this is roughly a reuse of something Peter suggested a long
time ago in a different context.)  We define an atomic store in a given
thread to depend on an (earlier in thread ordering) atomic load if by
altering only the value seen by that load, the store (i.e. the value,
location, or existence) is altered.  (This is informal, but I think
unambiguous and easily formalizable.  It is independent of compiler
dependence analysis, or the hardware's notion of dependence.)

2) Consider a "happens-before" ordering, which is defined differently
from before (this was Bill's suggestion).  In particular,
thread-ordering no longer immediately implies happens-before ordering.
The happens-before ordering is the smallest transitively-closed relation
consistent with the following:

	If A and B are ordinary (not atomic/synchronization!) memory
references, and A precedes B in thread order, then A happens-before B.

	If A is an atomic reference with an ordering constraint, then
that ordering constraint translates to the happens-before order, e.g.
all memory operations in a thread preceding a release operation
happen-before the release operation.

	If A is any store or lock release (ordinary or atomic, though I
think only the atomic case matters), and B is any load or lock
acquisition, and B sees the value stored by A, then A happens before B.

	I an atomic store B depends on an earlier atomic load A in the
same thread, then A happens before B.  In particular, assuming all
atomic operations, there is no happens before ordering between the load
and the store in r1 = x; y = 1;, but there is an ordering between r1 =
x; y = r1;

3) For an execution to be considered valid, we need the execution to be
intra-thread consistent.  In addition, no load may see a store that
happens-after it, or that happens-before an intervening store that
happens-before it, in either thread or happens-before order (but not a
combination of both, i.e. these are separate criteria).  We also need
the happens-before ordering to be acyclic, which I claim enforces a
constraint stronger than the Java notion of causality.  For example
(with everything atomic, unordered, and initially zero),

Thread 1: r1 = x; y = r1;
Thread 2: r2 = y; x = r2;

cannot produce 42 in r1 or r2.  This would require each load to see the
store in the other thread, which would imply a happens-before cycle.

4) The definition of a data race stays as before, i.e. concurrent
execution of conflicting ordinary operations.  Formally this probably
needs to be defined as the existence of a total order, consistent with
the happens-before order, in which two conflicting operations are
adjacent, but I think that's equivalent here.

Note that this approach would not have worked for Java.  It violates the
Manson/Pugh causality test case #1, for starters.  (See
http://www.cs.umd.edu/~pugh/java/memoryModel/CausalityTestCases.html). I
think that's OK here, since we're only restricting the semantics of
atomics, not ordinary memory operations.  We're effectively outlawing
inter-thread optimization of atomic operations.  I would be amazed if
those were ever profitable for realistic scenarios.  I believe that
enforcing the dependency-based ordering does not require memory fences
on any existing hardware.  (The problem with our earlier attempt at a
simplified model was that enforced the introduction of spurious fences.)

Hans