it's a rough rough rough draft

Fri Sep 10 02:57:23 BST 2004

> We can't possibly turn this into a threads introduction.
> Nor do I think we should try.  We should reference one.
> I think Andrew Birrell's "An Introduction to Programming
> with C# Threads"
> (http://research.microsoft.com/~birrell/papers/ThreadsCSharp.pdf)
> is probably fine, and has a good intro,
> though some of the finer details don't carry
> over.  Anyone know of a better one?

I think referring people to Butenhof's book on pthreads is the best thing to 
do.

The point is not to include a tutorial on threads, but rather a tutorial on 
memory visibility. As I told Doug, most people on the committee do 
understand lock-driven programming where you acquire a lock and then see 
everything everybody ever did and can do anything you want. Then they try to 
integrate an understanding of memory model within that simple framework, and 
they hit a brickwall.

I've had a very very relevant exchange with a gentleman on clc++m that has 
this exact pattern: "I'm not an expert on threads, but I would like to 
understand these things you are referring to". Me: "Here's some info, and 
here are some references." Him (furious): "I don't have the time and the 
inclination to read the references, but what you say makes no sense. You are 
snooty. I claim to have a pretty good understanding of threads, and what you 
say makes absolutely no sense in that framework, so my conclusion is that 
you are enjoy acting precious by using obscure language and not discussing 
your views." (Of course, not in these exact words, but that's the impression 
I've gotten.)

We should prevent such a reaction.

> Section 4:
>
> The core of the new Java memory model is to classify synchronization
> operations as "acquire" (e.g. lock acquisition) or "release" (e.g. lock
> release) operations on some synchronization object.  A "release"
> operation has the effect (among others) of ensuring that all memory
> operations performed by a thread before the "release" operation
> become visible to another thread after it performs a corresponding
> "acquire" operation.  Thus, for example a thread that acquires a lock
> is guaranteed to see all updates that occurred while another thread
> held the same lock.

Java needed to do that because the synchronization object were implicit 
(though I know Java 1.5 has explicit ones as well), but for C++ I think 
synchronization objects should be separate from the memory model. In wake of 
Doug's idea that the library and memory model proposals could be separate, I 
think it's easier to go a different route: We define terms such as memory 
barriers, weak read, strong read, weak write, and strong write, and later 
on, when we define synchronization objects and whatnot, we mention what they 
do in terms of the notions we defined in the memory model.

> Section 5:
>
> I would make this much more tentative, e.g. start with:
>
> We are planning to propose a set of atomic operations on shared memory
> locations to support lock-free programming.  For each such operation
> we will need to specify not only its function, but also the ordering
> constraints it imposes, e.g. whether it behaves as an "acquire" or
> "release" operation, or neither or both.  For many operations
> multiple variants make sense.  We are currently undecided as
> to both the syntax of these constraints, and how many variants there
> should be.  Thus we omit them for the rest of this presentation.

Sounds good, though people might confuse lock-free programming with the 
whiz-bang lock-free programming that has become popular nowadays. But said 
primitives can be used for much more conservative purposes, such as 
manipulating the reference count of a smart pointer.

> I think cas should be one of the atomic_ primitives.

Ok. By the way, I have two questions I'd like to ask:

1. Is there any machine of any importance to anyone that has threads but no 
CAS (or some way of implementing it efficiently)?

2. Couldn't the compiler use CAS to resolve the word tearing problem? I 
might be wrong here, but to me it seems like any word tearing problem can be 
solved if the compiler generates appropriate looped CAS to generate a proper 
read-modify-write sequence. Now things might be less efficient, but I don't 
know by how much.

> In any case, atomic_add needs to return a value.

Yah, and also I'm unsure whether int is the appropriate second parameter 
type.

> (Actually, I can think of a reason to diverge from it, which would
> be to have a single interface for both C and C++.  But I doubt that
> would fly.  And eventually there may be reasons to supply the
> ordering semantics as a template argument.)

Hmmm, that's a good question for later: how easy will we make the memory 
model "translatable" to the folks in the C std committee?

> Section 6:
>
> We plan to consider giving an assignment to a "volatile" variable
> or field "release" semantics, and read a "volatile" "acquire"
> semantics.  This is true for both the new Java memory model and
> the current C/C++ ABI for Itanium.  This would better specify
> the semantics of "volatile" and allow clean solutions to some common
> concurrent programming problems for which locks often introduce
> too much overhead.  But we have not yet had enough time to fully
> understand the consequences for C++.

That's great, and reflects my current view exactly.

Andrei