[cpp-threads] Examples of cross-thread ordering

Fri Jul 14 15:58:03 BST 2006

Hans wrote, quoting Jeremy:
> > Herb Sutter wrote:
> >
> > > Example 1:
> > >
> > >   // thread T1
> > >   x = 1;	// a
> > >   y = 2;	// b
> > >   r1 = y;	// c
> > >
> > >   // thread T2
> > >   y = 3;	// d
> > >   x = 4;	// e
> > >   r2 = x;	// f
> > >
> > > It is possible to have r1 == 3 and r2 == 1? How is the
> > answer derived
> > > from applying the rules in the Java and draft C++ memory models?
> >
> > Java doesn't allow this result for volatiles.  All accesses
> > to volatiles need to be in a single, total order, where each
> > read returns the last write to that variable (i.e.,
> > sequentially consistent).  This is not such a result.
> >
> The current C++ proposal does allow this with plain atomic<>
> assignments, though Herb and I were discussing whether this is the
> correct default.  
>
> Currently stores only have release semantics and loads
> only have acquire semantics.  Unlike Java, the current proposal is to
> not insist on a total order among atomic<> operations.  

Let me beat this horse a little more: Would this be a good time to make
a bet on whether 1% of the programmers in the world will get this right?
:-) (And this is just plain atomic<> as currently specified, without the
explicit <raw> uses.)

> You end up with
> 
> a -> b
> a -> f
> d -> e
> d -> c
> 
> which is acyclic.  But even adding the extra Java edges it would still
> be.  The difference is really the insistence on the total order.

As in my other email just now responding to Jeremy's, yes, that's the
result we intuitively want. But can we be crisper than "there is no
total order", and the exact rules by which the total order(s) can be
derived?

I would really like to see this tied together in a nice happens-before
package. The edges should be enough; if they aren't enough to answer
questions like this, wouldn't that seem to indicate incompleteness? I've
asked this about JMM, but now about C++MM: How do I rigorously get that
answer by applying the rules at a crisp step-by-step level?

> Somewhat intuitively, the current C++ proposal allows writes to be
> delayed, e.g. in a write-buffer, before they are seen remotely.  This
> seems to model what hardware (particularly X96) can do cheaply.  But
in
> slightly weird cases like this, it gives you a less expected outcome.
> 
> Thinking about this more, it seems a bit tricky to provide both the
Java
> version and the current version, but I think it can be done if we
want.
> >
> > >
> > >
> > > Example 2:
> > >
> > >   // thread T1
> > >   x = 1;	// a
> > >   r1 = y;	// b
> > >   r2 = y;	// c
> > >
> > >   // thread T2
> > >   y = 2;	// d
> > >   z = 6;
> > >
> > >   // thread T3
> > >   y = 4;	// e
> > >   r3 = z;
> > >   x = 8;	// f
> > >   r4 = x;	// g
> > >
> > > If r1 == 2 and r2 == 4 (and r3 == 6), is it possible to
> > have r4 == 1?
> >
> >
> > I don't see how; it doesn't look as if there is a total order
> > that would reflect that.
> Again, the current C++ proposal does not insist on the total order,
and
> I think this would be allowed.

Again, I really think even experts are liable to get this wrong all the
time. This is a classic "impossible" result, and I would plead that this
is the kind of thing we're supposed to be eliminating.

Hans, in your presentations to the C++ committee, you correctly hit on
the essential importance of usability, as something that must trump
incremental performance and even constrain current/future hardware
architectures. Wouldn't this be a case in point?

> > > (I think that variable z, and therefore r3, seem to be
> > unnecessary and
> > > I'll write Arvind separately about that, but I include it as it
> > > appears in the original example.)
> > >
> > >
> > > Example 3:
> > >
> > >   // thread T1
> > >   x = 1;	// a
> > >   y = 3;	// b
> > >   r1 = y;	// c
> > >
> > >   // thread T2
> > >   y = 4;	// d
> > >   r2 = x;	// e
> > >
> > >   // thread T3
> > >   x = 2;	// f
> > >
> > > If r1 == 4 and r2 == 2, what if anything can we say about a
> > > happens-before relationship between lines a and f?
> >
> >
> > Not a lot.  If -> is the happens-before relationship, we can
> > say a -> b
> > -> c, d -> e, f -> e and d -> c.  If there is no other code creating
> > happens-before relationships, then we can say that there
> > isn't a happens-before relationship between the two.
> The way the C++ proposal is currently formulated, we lose b -> c and d
> -> e.
> Otherwise the answer is the same.

Are you saying that applying the rules can't come up with a->f? If so,
isn't that a problem?

> > This is because there are two factors at work here in Java --
> > a total order over accesses to volatiles, and the
> > happens-before order, which is determined based on program
> > order and on which volatile read sees which volatile write.
> > In the total order, line a has to occur before line f,
> > because otherwise, line e wouldn't see the value 2.  This
> > implies nothing for the happens-before order, though, because
> > a and f are both writes.

Thanks,

Herb