[cpp-threads] D2335 (sequential consistency proof) revision

Fri Aug 24 01:26:11 BST 2007

[Also dropping core.]

> -----Original Message-----
> From: Lawrence Crowl
> 
> It seems to me that the approach we would with CAS should 
> also apply to locks.  That is, without decoration they are 
> SC, not just acquire or release.  (The lock is, after all, 
> RMW.)  With decoration, the lock operations can be less than 
> SC.  Unfortunately, declaration will complicate the lock API.
There are two possible options here, which I think we should try to keep
separate:

1. Ensure that race-free programs with trylock are SC, even if race-free
is defined as "simultaneous execution of conflicting actions".  This
would require making the analog of the example below work, even if x is
not atomic.  I think this would make the two possible definitions of
"race free" equivalent for locks (incl. trylock/timedlock) + SC atomics.
It would require a successful lock to have release semantics in addition
to acquire semantics.

2. Ensure a single total ordering for locks and SC atomics, but insists
that race-free be defined in terms of happens-before.  (This is the Java
solution.)  We get SC for race-free programs only with that one
definition of race-free (unless they don't use trylock or timedlock).

We actually discussed (1) a while ago.  Although it does simplify the
story for the programmer, the problem with doing this is that we would
be significantly penalizing very mainstream code in order to support SC
for code that we believe nobody should write and nobody does write,
namely code that breaks, other than by deadlock, if a lock were acquired
too early.  At least some of us decided against it at the time.  (If we
wanted to go this route, it might make sense to introduce a separate
lock type that did not support trylock/timedlock, to get most of the
performance back for the common case.)

I was suggesting option (2) below.  I'm not quite sure which one you're
suggesting.

>  How much performance would we lose?

For option (1), I think the loss is often a significant increase in the
time required for lock(), since it does something like double the fence
cost.  However, on architectures like X86 that implicitly include full
fences in RMW atomic operations, it probably doesn't matter.  There are
some measurements in
http://www.hpl.hp.com/personal/Hans_Boehm/misc_slides/reordering.pdf,
slides 10 and 11.  (Compare the "None" and "Lock" bars.  More details in
my PPoPP 07 paper.)  I expect the differences would be significantly
larger on PowerPC.

For option (2), a trylock needs to have acquire semantics, even in the
failure case.  On some architectures, this requires an extra fence in
the failure case.  I think this is a much more modest cost, in that it
affects at most failed trylocks.

I think I advocate either option (2) or leaving the current version
alone and accepting that trylock/timedlock get you out of the SC domain,
unless you follow some rules.

Hans

> 
> On 8/16/07, Boehm, Hans <hans.boehm at hp.com> wrote:
> > N2335 was intended to contain proofs that the proposed memory model 
> > was in fact ensures sequential consistency for race free 
> programs that 
> > satisfy some additional requirements (only SC atomics, lock(), and 
> > unlock()).
> >
> > It turned out that D2335 required a significant amount of revision, 
> > mostly as a result of Clark's N2334
> > (http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2007/n2334.htm)
> > rewording.  As a result, it did not make the post-Toronto 
> mailing, and 
> > will need to acquire a new number.
> >
> > I now have a version that looks OK to me, which I've attached.
> >
> > It turns out that this raised one other question that I had 
> overlooked.
> > (Note to core group: The rest of this is mostly a library issue.  I 
> > don't think N2334 is affected either way.)
> >
> > Consider the following (variant of a well-known) example, 
> where x is 
> > atomic (initially zero, SC ops only) and l1 is initially unlocked:
> >
> > Thread 1:
> >         x.store(1);
> >         l1.lock();
> >
> > Thread 2:
> >         r2 = trylock();  // fails
> >         r3 = x.load();   // yields 0
> >
> > Note that lock and trylock are both only acquire operations, even 
> > though the trylock reads the value written by the lock.
> >
> > Since there are no synchronizes-with relationships and hence no 
> > happens-before relationships across threads, and the 
> current plan is 
> > to insist on a total order only among SC atomics, not 
> locks, I believe 
> > this outcome is currently allowed, in spite of the fact 
> that only SC 
> > atomic operations are used.  This outcome is of course not 
> > sequentially consistent.  And the program is race-free by virtue of 
> > mentioning only synchronization operations.
> >
> > (Herb points out that timedlock() is for present intents 
> equivalent to 
> > trylock().  These issues don't arise with just lock() and unlock().)
> >
> > My tentative conclusion is the way this is currently proposed is 
> > probably not a disaster (the prrofs go through without trylock(), 
> > which inherently causes some problems anyway), but it's 
> probably not 
> > really what we want.  We should instead insist that there 
> be a single 
> > total order, covering both atomics and locks and consistent 
> with hb, 
> > such that each atomic operation and lock sees the previous write to 
> > the corresponding object in this sequence.  I think this makes it 
> > easier to characterize SC-compatible uses of trylock().
> >
> > But in implementation terms, this change would seem to 
> require that a 
> > failed trylock still needs enough ordering semantics to prevent 
> > reordering with a subsequent SC load.  Presumably Java already 
> > requires this.  Does anyone feel that this is a serious issue?  I 
> > guess this reraises some of the issues we previously 
> discussed for CAS 
> > failure ordering, which I previously thought we didn't have 
> for locks.
> >
> > Opinions?
> >
> > Hans
> >
> > --
> > cpp-threads mailing list
> > cpp-threads at decadentplace.org.uk
> > http://www.decadentplace.org.uk/cgi-bin/mailman/listinfo/cpp-threads
> >
> >
> >
> 
> 
> --
> Lawrence Crowl
> 
> --
> cpp-threads mailing list
> cpp-threads at decadentplace.org.uk
> http://www.decadentplace.org.uk/cgi-bin/mailman/listinfo/cpp-threads
>