Memory model

Tue Jan 25 20:32:58 GMT 2005

In the interest of making progress, here's an attempt
to outline where I think the sticky issues are in defining
a memory model for C++.  (This mostly ignores, for now, the atomic
operations part of our charter, which I currently think is
easier.  It also ignores the issue of threading primitives,
which I would like to continue to ignore, but which we mentioned
originally.)

The JSR 133 effort made some assumptions would could easily
be justified for Java, but I think are harder to justify for
C++ .  In particular:

(1) We could not just leave the semantics of programs with races
undefined, for security and type-safety reasons.  

(2) We essentially assumed it was obvious which language
constructs required memory reads and writes.  The JSR133
effort concentrated almost exclusively on defining which writes
could be seen by a read.

(3) Pointer and int operations were always atomic.

(4) We decided that the visibility of a write to a later read
was ensured only by synchronization operations on the SAME
synchronization object (e.g. lock) by both threads.  The thread
performing the write has to release a lock, and the reader has
to later acquire the SAME lock to ensure visibility.  In Java
this is essential to make it possible to eliminate useless
synchronization, a problem that seems to have arisen from a
combination of security considerations and possibly a language
design flaw.

(5) We had to add restrictions to enforce causality.  These
are probably the most complex and technically interesting piece
of the Java model.

I think that in all cases, we have choices for C++:

(1) I suspect that we could leave the semantics of programs that
have races under a sequentially consistent interpretation undefined,
provided our definition of "race" is such that the calls into
the atomic operations library don't count, or are viewed as
synchronization operations.  This would force anything like
the Eratosthenes Sieve example in my paper to call the atomic
operations library to be correct.  That might not be a bas thing,
for readability reasons.  (This route is not compatible with
Java-like security guarantees.  And Herb Sutter seemed to be
interested in moving in that direction.)

(2) The more I think about this one, the more it worries me.
I think we want to outlaw overwriting adjacent fields, except
in the bit-field case.  And I think that has minimal impact
on performance.  But consider (ri are locals/registers whose
address is not taken, x is global and possibly shared):

r1 = x;
for ( ... = r1; ...; ... ) {... } // Messy computation; touches no globals.
					    // No further references to r1.
... r1 ...

Let's say the compiler runs out of registers compiling the loop,
and wants to spill r1, can it reload r1 from x, or does it need
a separate spill location?  My guess is that most current C++ compilers
would reload from x if they could determine that to be safe for
sequential code.  Changing this probably would affect performance.
Thus it's not at all clear to me that we want to prohibit a read
from being repeated.

If x is actually a shared global, I think you do not want to allow
the compiler to reload from x.  But there is an argument that in that
case either x should be volatile, or the reference to x should use
the atomic operations library.

(3) If we allow races, we need to say something about atomicity here.
This may get us into trouble on some low end hardware?

(4) We could conceivably say (using the Java terminology) that there
is a happens-before edge from any release action to any later acquire
action.  I'm not sure that this would be either useful or harder to
implement on current hardware.  A convincing argument either way would
be nice, but this may not matter much.

(5) I think we can omit the causality constraints unless we want to
address type-safety issues.

Any comments?  Arguments that these are the wrong issues?

Hans