[cpp-threads] Alternatives to SC

Sun Jan 14 02:42:32 GMT 2007

On Sat, Jan 13, 2007 at 01:18:49PM -0800, Herb Sutter wrote:
> This discussion has been very helpful. I think I've finally been able to
> put my finger on what bothers me about IRIW without toy examples (I did
> post a couple of specific potential 'real-world' examples, but I think
> they weren't compelling for people). Let me try to ask the question(s)
> another way.
> 
> The fundamental philosophy of CCCC seems to be:
> 
>   Correctly locked programs are SC.

We could probably get ten programmers together and get twenty different
definitions of "correctly locked".  ;-)  That said, I agree that there
are some locking design rules that result in no visible deviations from
SC.  It is of course trivial to create non-SC programs that use locks,
and even non-SC programs where every statement is guarded by a lock.

>   Atomics aren't quite SC.
>
> Is it fair to say that this implies that
> 
>   A program where all variables are atomic isn't quite SC.
> 
> ?

As with locking, it should be possible to define "correct" usages
of CCCC atomics that result in no visible deviations from SC.
So I would substitute something like:

	Correctly constructed uses of atomic variables are SC.

Don't you agree that there should be rules governing use of atomics
just like there are rules covering use of locking?  I do -not- believe
that it is this group's job to construct all possible useful rulesets,
but rather to lay a reasonable foundation that hardware, compiler,
and software people can live reasonably with.

> So now let me cast the question this way, which I think finally puts
> the finger on what's been bothering me:
> 
>   Is an atomic variable a monitor?

The word "monitor" has all sorts of connotations, and I so I am not
sure I understand the question.

> Or equivalently:
> 
>   Does an atomic variable have internally-locked semantics?

At the very least, any access to an atomic variable should return
the result of some prior write or the initial value, rather than
returning a melange of values from more than one prior write.
For large variables, many architectures would require some sort
of access discipline, be it locking, interposing indirection,
or tricky constraints on ordering of accesses.

But I get the feeling that we are talking past each other.

> (Aside: Relationships to lock semantics keep coming up. This question
> is indirectly related to the question of whether transactional memory
> systems should obey "single global lock" semantics.)

I do not believe that transactional memory is ready for standardization.

> Consider two programs:
> 
>   Program 1, lock-free IRIW
> 
>   P1            P2            P3            P4
>   x=1;          y=1;          r1=x; [=0]    r3=y; [=0]
>                               r2=y; [=1]    r4=x; [=1]
> 
>   Program 2, lock-based IRIW
> 
>   P1            P2            P3            P4
>   lock(x);      lock(y);      lock(x);      lock(y);
>   x=1;          y=1;          r1=x; [=0]    r3=y; [=0]
>   unlock(x);    unlock(y);    unlock(x);    unlock(y);
>                               lock(y);      lock(x);
>                               r2=y; [=1]    r4=x; [=1]
>                               unlock(y);    unlock(x);
> 
> I think that CCCC says program 1 is not SC, but program 2 is SC. In
> particular, in CCCC, the [=shown] values are legal for program 1, but
> not for program 2.

I have not analyzed Program 2, but I do believe that the write events
implied by the locking do indeed constrain the execution as you say.
And impose -lots- of extra overhead, especially if threads P3 and P4
execute very frequently compared to P1 and P2 -- the locking prevents
you from obtaining the benefits of read-sharing in the CPU caches,
which can be a multiple-order-of-magnitude performance hit.

No deadlock concerns in this case, because you don't have nested locks.

But when locks suffice for my code, I definitely stick with locks.

> If we say that, then atomics aren't monitors. I think this is the it
> semantic difference that will surprise people (even those who know better
> would forget), and it would occasionally matter, and so that needs to
> be justified.

On the one hand, almost every programming construct surprises people
at some point or another, so decisions based solely on surprise will
eliminate all possible programming constructs, which I believe that we
can agree would be a bad outcome.

On the other hand, why anyone be telling people that atomics are monitors?
If you want monitors, after all, you can easily use locking.

> So let me ask whether this is really what you intended:
> 
>   Q1: If you think that for program 1 the non-SC result should be
> allowed, do you also agree that for program 2 the non-SC result should
> be allowed? If not, why not?

I do believe that a non-SC result should be allowed for Program 1, and
that it is reasonable to expect that Program 2 would always give SC
results (though, again, I have not thoroughly analyzed Program 2).

I believe that it is eminently reasonable to expect different results
because very different materials are being used in construction of the
two programs.

There are any number of analogs to this situation.  I will choose integer
and floating-point arithmetic.  Integer arithmetic is associative,
while floating-point arithmetic is not necessarily associative.
Different materials, different results.

>   Q2: Conversely, if you think that for program 2 the non-SC result
> should be disallowed, do you also agree that for program 1 the non-SC
> result should be disallowed? If not, why not?

Again, the two programs use different materials with different properties,
so it is only reasonable to expect that they may behave differently.

>   Q3: Combining the two parts, why do the cases where non-SC creates
> problems for lock-based programs not apply equivalently to program 1?

Nice try.  ;-)

Please note that I did not say that non-SC creates problems for lock-based
programs.  I -did- agree that restricted (but common) uses of locking
might hide non-SC behavior from the programmer, however, it is quite
trivial to construct non-SC locked programs.  Take any non-SC program,
and have each thread acquire its own per-thread lock at start, and
release this lock upon exit.  The program is fully locked, but retains
any non-SC behavior that it started with.

So let's examine a slightly different question, which might be closer
to your intent:  "Why should we tolerate non-SC behavior in atomics
when it is easy to eliminate non-SC behavior in locking?"

My answer?  First, if locking does what you need, stay with locking.
Second, if you use atomics, try to follow the recipes that have been
provided, for example, by Doug Lea.  Doing this will also help you to
avoid many troublesome pitfalls, including a great many that don't involve
SC vs. non-SC.  Third, if something like CCCC pans out, are the deviations
from SC really of concern?  My current belief is that they are not.
Fourth, accepting lighter-weight semantics (of which non-SC is but one
example) is quite appropriate in many situations.  For example, if you are
counting the number of bytes received over multiple networking interfaces
for statistical purposes, SC is horribly expensive and of little value.

Going back to the integer/floating-point analogy, why would anyone ever
willingly give up a fundamental mathematical property like associativity?
A large number of programmers do so willingly in order to gain the
benefits of floating point's greater dynamic range -- despite the great
mathematician John Von Neumann's assertion that all computation should
be done using integer arithmetic.

> I realize that this comes around to the question of whether atomic
> operations, like locks, should be a global property, and I realize
> people want to make it not a global property. But if it's not global,
> then atomics aren't monitors, and I now realize it's that consequence
> that I haven't been able to swallow yet.

This may be a key point -- I never have treated atomic variables as
if they were monitors.

So, why do you want atomic variables to act like monitors when you
already have locking from which to construct monitors?

> I would appreciate any further thoughts on this. More views would
> definitely be helpful, especially on what the differences might be
> and why.

I believe that we might well be getting somewhere here.

> Paul wrote:
> > Similarly, given that we don't have SC now, a more appropriate question
> > is "how much do we lose by insisting on SC?".  The answer varies wildly
> > with the specific algorithm, but in my experience, an order of
> > magnitude
> > is a good rule of thumb for software enforcement.  To see this, take
> > the example that Peter Dimov recently posted to this list and
> > generalize
> > to a large number of threads -- have N threads write to N independent
> > non-false-shared variables, and have two other threads concurrently
> > read
> > the variables in reverse order from each other, then check for
> > consistency
> > with SC.  Run on an SMP machine with N+2 hardware threads.  For extra
> > credit, place the two reader threads on different dies, but each
> > sharing
> > their die with some of the writer threads.
> >
> > Do the two readers see results consistent with a single order for
> > all of the writes?  How much overhead do you have to add to get this
> > example to reliably produce results consistent with SC?
> 
> Let me pop slightly higher and apply the more general question: Should
> the answer be different if you use atomics vs. if you put individual
> locks around individual ints? If yes, why?

I don't know if the answer -should- be different, but given the vastly
different properties of locking vs. atomics, it is certainly reasonable to
expect that differences might appear.  Again, if locking is sufficient for
your program, be happy with locking.  If locking is not sufficient, then
insisting that all of locking's semantics be present in the replacement
might not get you where you need to be.

Back to the integer/floating-point analogy, insisting that floating point
obey associativity will not get you an implementation of floating point
that is generally useful.  For example, unbounded rational arithmetic
might impose unacceptable speed and memory penalties.  Similarly,
insisting on SC for atomics seems unlikely to get us where we need
to be.

> > > the reverse. (I have in mind Mark Hill, Andy Glew, and others.)
> >
> > Towards TSO, maybe.  In light of the example above, I have a hard
> > time believing they will really get all the way to SC.
> 
> One reference I had in mind was:
> 
>   "Multiprocessors Should Support Simple Memory Consistency Models"
>   Mark D. Hill, IEEE Computer, August 1998
>   http://citeseer.ist.psu.edu/hill98multiprocessors.html
> 
> Mark argues that hardware should implement (not only
> enable) SC, "or in some cases" read reordering. He reaffirmed
> this in a retrospective at Dagstuhl 2003 (references listed at
> http://www.cs.wisc.edu/multifacet/papers/by_topic.html), and cites some
> data in the meantime that supports SC isn't at a performance disadvantage
> given sufficient speculation.
> 
> I don't know of any subsequent change in his opinion.

First, his analysis was qualitative, and thus unconvincing to me.

Second, his paper showed no signs of accounting for the realtime workloads
that are already appearing on SMP platforms.  This is not a slam against
Mark Hill -- heck, 1998 was a -long- time ago in computer years!

> > You are asserting that SC will help us.  I remain unconvinced.
> 
> SC means atomics are monitors. CCCC means atomics aren't monitors. Is
> that correct (enough)? And does it help?

You still need to show me a compelling reason why we need atomics to be
monitors, especially given that we already have locking for that purpose.

> > It seems
> > much more likely that higher levels of abstraction are what is needed
> > instead.
> 
> c/instead/too/ and I'm in full agreement.

Then I guess we must settle for partial agreement, at least for the
moment.  I believe that the higher levels of abstraction will obviate
SC atomics, and further that SC atomics will impose performance
penalties that will harm the implementations of these abstractions
for no benefit.

						Thanx, Paul