C++ memory model

Sat Feb 26 17:17:30 GMT 2005

Again, sorry for the pauses between messages. Last week was midterms.

   I don't have any fundamental disagreements, but maybe a difference
   in approach. Think food chain. The lowest layer is the stuff needed
   so that you can build higher abstractions. So you need some semantics
   for "volatile", promises about ordering guarantees, lack of unexpected
   writes, basic atomics, and access to underlying underlying OS services.
   Modulo issues like type/thin-air safety described by Hans last
   month, I think the general nature of this part is a known quantity.
   (Although still a LOT of work to carry out.)

But it's not a food chain. As I mentioned in my last message, any concurrent
language does not need to build up from primitive language mechanisms to make
concurrency work. Clearly, the language has to generate correct code, but it
does this directly from its constructs for each architecture. For example,
threads and monitors in Java do not require "volatile" or a complex memory
model if a programmer follows a strict convention of concurrent programming
solely using Java threads and monitors (as is true in uC++, Ada, etc.) Only if
you want to step outside of this zone do you need the food chain. And as many
people have pointed out, it is sometimes useful to step outside of the normal
concurrent programming model to do a few special things. But the people that do
step out of the zone better be advanced concurrent programmers or they are
going to screw up big time. So almost all concurrent programmers should stay in
the zone almost all of the time, and only a few concurrent programmers should
stray out of the zone very little of the time. If you don't believe this is
true, we're at square zero. Let me quote from one of the other members of this
group:

 Library-based approaches to concurrency normally require a very disciplined
 style of synchronization by multi-threaded programs. Although we agree that
 this is appropriate for 98% of uses, we argue that it eliminates some
 low-level programming techniques which, in some cases, may be essential for
 obtaining any performance benefits from multiple processors.

   I do think that it would be great for you (and others) to put together
   proposals for higher layers so that we can contemplate exactly what is
   needed underneath. My expectation is that that low-level ordering,
   read/write, etc rules stay relatively constant across different high-level
   abstractions, but I'm always happy to be surprised.

And I see this the other way around. At the high-level, who cares what you need
underneath? You just make it work on the architecture you are compiling for.
There are no program-level races, double-checks, etc., and you know ALL the
places where you can or cannot do an optimization. Now I agree there is still a
need for some memory rules to handle funny cases, like adjacent data, but these
are rare and probably easily fixable once you have strong concurrent semantics.

So what you really want to figure out is how far do you want to let programmers
stray out of the zone and still supply some guarantees. Once you do this, it is
then possible to define the least restrictive memory model that supports these
out-of-zone aspects. The least restrictive memory model should always give the
maximal potential for optimization.