[cpp-threads] Alternatives to SC

Sat Jan 13 21:18:49 GMT 2007

This discussion has been very helpful. I think I've finally been able to put my finger on what bothers me about IRIW without toy examples (I did post a couple of specific potential 'real-world' examples, but I think they weren't compelling for people). Let me try to ask the question(s) another way.

The fundamental philosophy of CCCC seems to be:

  Correctly locked programs are SC.
  Atomics aren't quite SC.

Is it fair to say that this implies that

  A program where all variables are atomic isn't quite SC.

?

So now let me cast the question this way, which I think finally puts the finger on what's been bothering me:

  Is an atomic variable a monitor?

Or equivalently:

  Does an atomic variable have internally-locked semantics?

(Aside: Relationships to lock semantics keep coming up. This question is indirectly related to the question of whether transactional memory systems should obey "single global lock" semantics.)

Consider two programs:

  Program 1, lock-free IRIW

  P1            P2            P3            P4
  x=1;          y=1;          r1=x; [=0]    r3=y; [=0]
                              r2=y; [=1]    r4=x; [=1]

  Program 2, lock-based IRIW

  P1            P2            P3            P4
  lock(x);      lock(y);      lock(x);      lock(y);
  x=1;          y=1;          r1=x; [=0]    r3=y; [=0]
  unlock(x);    unlock(y);    unlock(x);    unlock(y);
                              lock(y);      lock(x);
                              r2=y; [=1]    r4=x; [=1]
                              unlock(y);    unlock(x);

I think that CCCC says program 1 is not SC, but program 2 is SC. In particular, in CCCC, the [=shown] values are legal for program 1, but not for program 2.

If we say that, then atomics aren't monitors. I think this is the it semantic difference that will surprise people (even those who know better would forget), and it would occasionally matter, and so that needs to be justified.

So let me ask whether this is really what you intended:

  Q1: If you think that for program 1 the non-SC result should be allowed, do you also agree that for program 2 the non-SC result should be allowed? If not, why not?

  Q2: Conversely, if you think that for program 2 the non-SC result should be disallowed, do you also agree that for program 1 the non-SC result should be disallowed? If not, why not?

  Q3: Combining the two parts, why do the cases where non-SC creates problems for lock-based programs not apply equivalently to program 1?

I realize that this comes around to the question of whether atomic operations, like locks, should be a global property, and I realize people want to make it not a global property. But if it's not global, then atomics aren't monitors, and I now realize it's that consequence that I haven't been able to swallow yet.

I would appreciate any further thoughts on this. More views would definitely be helpful, especially on what the differences might be and why.

Paul wrote:
> Similarly, given that we don't have SC now, a more appropriate question
> is "how much do we lose by insisting on SC?".  The answer varies wildly
> with the specific algorithm, but in my experience, an order of
> magnitude
> is a good rule of thumb for software enforcement.  To see this, take
> the example that Peter Dimov recently posted to this list and
> generalize
> to a large number of threads -- have N threads write to N independent
> non-false-shared variables, and have two other threads concurrently
> read
> the variables in reverse order from each other, then check for
> consistency
> with SC.  Run on an SMP machine with N+2 hardware threads.  For extra
> credit, place the two reader threads on different dies, but each
> sharing
> their die with some of the writer threads.
>
> Do the two readers see results consistent with a single order for
> all of the writes?  How much overhead do you have to add to get this
> example to reliably produce results consistent with SC?

Let me pop slightly higher and apply the more general question: Should the answer be different if you use atomics vs. if you put individual locks around individual ints? If yes, why?

> > the reverse. (I have in mind Mark Hill, Andy Glew, and others.)
>
> Towards TSO, maybe.  In light of the example above, I have a hard
> time believing they will really get all the way to SC.

One reference I had in mind was:

  "Multiprocessors Should Support Simple Memory Consistency Models"
  Mark D. Hill, IEEE Computer, August 1998
  http://citeseer.ist.psu.edu/hill98multiprocessors.html

Mark argues that hardware should implement (not only enable) SC, "or in some cases" read reordering. He reaffirmed this in a retrospective at Dagstuhl 2003 (references listed at http://www.cs.wisc.edu/multifacet/papers/by_topic.html), and cites some data in the meantime that supports SC isn't at a performance disadvantage given sufficient speculation.

I don't know of any subsequent change in his opinion.

> You are asserting that SC will help us.  I remain unconvinced.

SC means atomics are monitors. CCCC means atomics aren't monitors. Is that correct (enough)? And does it help?

> It seems
> much more likely that higher levels of abstraction are what is needed
> instead.

c/instead/too/ and I'm in full agreement.

Herb