Memory model

Sat Feb 26 23:32:18 GMT 2005

   1) Memory model
   2) Atomic operations and perhaps primitives for building locks
   3) Higher-level standardized thread library

   In my opinion, the large majority of programmers will care only
   about (1) and (3).

I agree, where (3) includes actual high-level language constructs, too.

   a) If you look at current standard practice, that's the one place were
   things are really intolerably broken.

   You might not like the pthreads interface.  But there is nothing to prevent
   that interface from being used in correct multithreaded programs, except
   that the current de facto language memory model rules allow the compiler to
   introduce races and thus break your program.  (The same applies to win32
   threads, Boost, uC++ (I suspect), etc.)  There is a lot more detail about
   these issues in my PLDI 2005 paper.  (draft at
   http://lib.hpl.hp.com/techpubs/2004/HPL-2004-209.html)

Hans, this URL hangs for me and all my students, but we still managed to get a
copy of the paper. It's nice to have someone other than me say "threads cannot
be implemented as a library". I've taken a lot of flak over the years for
adopting this position. However, your paper says it is going to look at the
library approach, and then largely assumes the pthreads approach, which is not
a library approach. As you point out in the footnote on page 3:

 All these functions (referring to the pthreads synchronization routines) would
 have to be recognized by advanced compilation systems so that memory operations
 and calls to these functions are not reordered by optimizations.

Clearly, for any compiler to recognize a routine implies that routine name is
now a "keyword", and hence, part of the language. That is, every C compiler has
to be cracked open and appropriate code added to deal with pthreads. The Java
people often do the same thing: they add a magic routine, exception, or
variable name to make a new feature work, and then claim that they did not
change the language. I call a spade a spade, and I claim it is the same or less
work to add a new language feature correctly as it is to do it by the back
door. The fact is that pthreads is embedded in C and UNIX at such a level that
it is becoming a major threat to any further advancement in concurrent
programming.  For example, in UNIX, when a system routine is designated
thread-safe, it often means only with respect to pthreads, which precludes
other concurrent language approaches. In uC++, we have an almost complete
pthreads simulation hidden under the covers and use it to "trick" UNIX system
routines to call back to uC++ so all the locking is consistent. It's not easy
to do and it's fragile.  You might say, just use pthreads to implement
uC++. But pthreads has its own specific design, which is not compatible with
other designs, making simulation impossible (e.g., scheduling of threads).

What I'm saying, Hans, is that the 98% of the time you think that a thread
library does work, this is only true for pthreads and not any other "pure"
library approaches, i.e., one where the compiler knows nothing about the
library itself.  Hence, you may be significantly underestimating how much
threads cannot be implemented as a library. So while I agree with everything
your paper says with respect to memory models and pthreads, I don't know what
might be missing with respect to memory models and any other thread library
approaches. In other words, how much more does a memory model have to go to
truly handle a "pure" thread library approach?

   b) As Stroustrup suggested, I think it will be difficult to get agreement or
   standardization on (3).  By starting there, there is a substantial danger
   that we'll never get around to the issues for which we can really have an
   impact.  As Doug points out, that shouldn't stop us from trying.  But we
   shouldn't block everything else on it.

While I personally think the high-level is the place to start, I fully agree
that politically we need to start at the bottom. My problem is that I'm not
sure how to start at the bottom and still get the top right.

   c) If you look at the Java memory model, there is lots of reason
   to believe that we can formulate a memory model that makes sense
   across a wide variety of synchronization primitives.  Generally,
   we care only which pairs of synchronization operations (e.g.
   unlock/lock on a given lock, or volatile write/read on the same
   variable) ensure visibility between threads.  I think it's quite
   viable to have the language define a general memory model, and then
   to have a particular high level layer simply define this piece.
   The fundamental issues are the same whether we assume the higher
   layer is pthreads or uC++, or probably even OpenMP.

Let's assume it is true that a general memory model can be found for most
concurrent systems. I conjecture that such a memory model must push back on the
programming language in such a way as to require changes of a substantive
nature to the language definition, and correspondingly, all compilers for that
language.  For example, how many "magic" routine names need to be defined to
introduce a wide variety of synchronization primitives so the compiler can
understand the program is concurrent and where all the magic points are? What
are the arguments to these routines that will work for most concurrency
approaches? And most importantly, the basic objects that are manipulated by
these magic routines, such as a mutex or condition lock, must have some
fundamental size that supports a number of different approaches. This is
crucial because a system routine that wants to be thread safe needs to set
aside storage for a lock, and wants to do this statically for a host of good
reasons. But the actual lock that goes into this storage varies depending on
the thread system used by the application calling the system routine. That is,
pthreads, uC++, Java, Ada all want to manipulate that lock storage in a
different way so when a user-level thread in each of these concurrent system
calls the system routine, the correct action occurs. And I reject as too
Draconian a solution requiring all concurrent systems to adopt a 1:1 versus M:N
user/kernel thread model or require all concurrency systems to be built on top
of pthreads. To adopt such a solution is to suggest that pthreads is a
universal concurrency system, and that is patently untrue.