[cpp-threads] Prism 0.9.1, and further on the effects of races

Tue Sep 12 20:53:13 BST 2006

I'm actually not sure I understand the objection to fully undefined
behavior.  If I read an element past the end of an array, I believe I
get fully undefined behavior right?  It's not just the returned value
that's unspecified.

I think the situation here is basically the same.  A data race on an
ordinary variable in C++ is a program error, just like an out-of-bounds
array reference.  We're proposing to provide sufficient "atomic"
operations to ensure that a data race on an ordinary variable is never
necessary.  So I think there isn't really a reason to support this to
any extent whatsoever.

I also believe fully undefined behavior is the status quo.  Posix is
fairly explicit about that.  And my impression is that existing win32
implementations essentially follow the Posix model in this respect.
Thus we're only breaking things that are already broken.

If we did try to support partially undefined behavior for races, I
wouldn't really know how to define what it means.  My impression is
that, unlike Java, we would like to support hardware on which pointer
reads and writes are not atomic?  I think that's necessary for some low
end embedded processors, though I'd love to be corrected.

Assuming I'm correct, we'd have to allow for the case that foo *p =
global_ptr yields a value for p that was never assigned to global_ptr
and in fact doesn't point to foo.  It may be that only p is corrupted,
but I'm not sure how to describe the result in the standard or how to
take advantage of such a standard to recover from the result.
Similarly, reading an integer involved in a race may yield a value never
assigned to the variable or, for enumeration types, one that's not in
range for the type.

If we do try to partially define races, I think other optimizations are
also affected.  We would have to disallow reloading of a spilled
register from a global, since the impact of that is very hard to
describe in the presence of a race.

As far as constructors are concerned, I think I'm most worried about a
different class of example.  Consider, at function scope:

{
	foo x();

	x.bar();
	f();
}

Assume a foo has virtual functions and hence needs a vtable, and the
definitions of bar() and f() are in separate compilation units.  I don't
think there's anything to stop bar from publishing "this" to another
thread.  Clearly I would need a release operation or a fence somewhere
to make the vtable pointer store visible before such a publication
happens.  But putting it in any of the obvious places, e.g. at the end
of the foo() constructor, seems to impact sequential code.  And if bar()
and f() are sufficiently fast, this code could slow down by large
factors as a result, at least if I only have 100 cycle fence
instructions.

This is inherently different from Java, in that there is no allocation
involved.  I also don't think this can happen in the CLR, right?

I think this is all much easier with fully undefined behavior for races.
And that wouldn't prevent implementations with additional requirements
from providing stronger guarantees.  But I don't think we should
encourage portable C++ code to rely on any semantics for ordinary data
races.

Hans

> -----Original Message-----
> From: cpp-threads-bounces at decadentplace.org.uk 
> [mailto:cpp-threads-bounces at decadentplace.org.uk] On Behalf 
> Of Herb Sutter
> Sent: Tuesday, September 12, 2006 11:05 AM
> To: C++ threads standardisation; c++std-ext at accu.org
> Subject: RE: [cpp-threads] Prism 0.9.1, and further on the 
> effects of races
> 
> Oops, -ext is the right place for this and I forgot to include it.
> 
> 
> > -----Original Message-----
> > From: cpp-threads-bounces at decadentplace.org.uk [mailto:cpp-threads- 
> > bounces at decadentplace.org.uk] On Behalf Of Herb Sutter
> > Sent: Tuesday, September 12, 2006 10:47 AM
> > To: C++ threads standardisation
> > Cc: Bjarne Stroustrup
> > Subject: [cpp-threads] Prism 0.9.1, and further on the effects of 
> > races
> >
> > Two things:
> >
> >   - a quick update on a new Prism draft
> >     (thanks for all the comments so far!)
> >
> >   - a big question looming since the Redmond ad-hoc
> >
> > First, Prism: There is now an updated draft of the Prism spec 
> > published in the current ISO C++ mailing (pre-Portland; 
> we're meeting 
> > in Portland, OR on Oct 15-20). It is version 0.9.1 (Sep 8, 
> 2006), PDF available here:
> > http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2006/n2075.pdf .
> >
> > Second, the question: As you'll see on the cover page of 
> the Prism doc 
> > above, the sentiment about "what to guarantee during races" is 
> > virtually identical internally at Microsoft and externally at the 
> > Redmond ad-hoc meeting we just had: People like Option 4, 
> which boils 
> > down to "go ahead and optimize as long as you don't move 
> stuff out of 
> > a critical section and don't invent writes to locations 
> that wouldn't 
> > be written to in an SC execution." Cool, and fine.
> >
> > Now the "what guarantees in races" question has I think 
> been narrowed 
> > down to just this one question:
> >
> >   In a race on variable x, what is undefined/unspecified:
> >
> >   (a) The value of x (only, and whatever the program then does
> >       with a bogus value)?
> >
> >   (b) The entire program (global undefined behavior)?
> >
> > I really, really, really hate (b). I think that if we 
> decide on (b), 
> > we've punted and failed in an important place.
> >
> > But Hans points out that (a) isn't easy. Clearly any race 
> on a pointer 
> > to function could cause a wild branch -- and that's okay so far, 
> > because explicit pointers to functions are covered by (a) and the 
> > programmer can see the pointer and reason about what it means to 
> > follow a corrupt pointer, and deserves what we gets. No issue with 
> > that, if the pointer to function appears in the program's 
> source code 
> > (including inside a function object).
> >
> > The problem, however, is that a race can occur on the 
> construction of 
> > an object and another thread can see an invalid 
> vptr/vtable. The vptr 
> > and vtable are not visible to the programmer, who doesn't 
> see the vptr 
> > or vtables in the source code. To the programmer, he's 
> simply racing 
> > on a pointer to object, here global_ptr:
> >
> >   thread 1: global_ptr = new T();
> >
> >   thread 2: global_ptr->Foo();
> >
> > PRINCIPLE: I feel very strongly that, since we (the language and the
> > compiler) are the ones who are creating this hidden vptr/vtable 
> > machinery under the covers, it ought to be our 
> responsibility to get 
> > the vptr and vtable right, so that users don't get globally 
> undefined 
> > behavior and potentially wild branches in code like the above.
> >
> > I'm currently trying to figure out how to prevent this. 
> Here are two 
> > options. Option 1 works (I think) but penalizes all 
> pointers the user 
> > ever wrote, which is unfortunate, and arguably at the wrong level 
> > (which is why it's such a heavy a hammer). Option 2 has unanswered 
> > questions but I think it's the right approach because it says we're 
> > taking responsibility for generating vptr and vtables 
> correctly. Put 
> > another way, it feels to me like Option 1 is the proverbial 
> "pound of 
> > cure," and Option 2 (if it can be made to work) is the 
> "ounce of prevention."
> >
> > Option 1 is to implicitly treat all pointers as "volatile"
> > (acquire/release semantics). I can hear the howls of protest, and 
> > there would be a performance impact, but I don't know if 
> anyone knows 
> > how much the impact really would be because I don't know if 
> anyone has measured it.
> > I conjecture without proof that it might be less in C++ than in 
> > environments like JVM and CLI that use pointers more 
> heavily [insert 
> > ironic comment here about how they claim not to have 
> pointers]. But I 
> > think this would "cure" the whole problem, at some cost to 
> be determined.
> >
> > Option 2, which unfortunately I can't yet completely see 
> how to make 
> > work in C++, is to have special treatment for vptrs. (I think 
> > conceptually this is the correct thing to do -- after all, we're 
> > creating the vptr machinery, and it should be our job to 
> make it work 
> > right!) Note two cases where the problem arises (if there 
> are more let 
> > me know, these are the ones I can imagine):
> >
> >   (A) When the vtable is created, the writes to the vtable slots get
> >       reordered to perform after taking the address of the vtable
> >       (giving someone a vptr), so that someone can see a vptr to a
> >       not-fully-set vtable.
> >
> > I think we can deal with (A) by making the vptr "volatile" (acq/rel
> > semantics) and making sure any vptr to a vtable is set after the 
> > vtable's own entries by the thread that first creates and publishes 
> > the vtable (this should require no changes to existing code which 
> > should be doing it this way already). Frankly, even if we 
> can't solve 
> > (B), I think this part is probably necessary.
> >
> > HARD PROBLEM: But the harder problem is:
> >
> >   (B) The vptr points to a base vtable, not yet the fully 
> derived vtable,
> >       and the program tries to access a derived slot offset 
> beyond the
> >       end of the base vptr. This issue stems from the fact 
> that vptrs
> >       are written multiple times.
> >
> > Can you think of any way to deal with (B), e.g., by making vptrs 
> > written only once at the end of a full object's construction? Note 
> > that this set- the-vptr-once seems like it would work trivially in 
> > Java/CLI two-pass construction with deep virtual calls during 
> > construction/destruction, but I don't see how to make it 
> work in C++. 
> > Is there a way to deal with (1) short of adding an 
> "isConstructed" bit 
> > to every (sub)object which blows binary compatibility (and 
> probably some size and speed bloat)?
> >
> > Note that the following strategy does not work (thanks to 
> Lawrence and 
> > Hans for pointing this out), because of C++'s multiple 
> settings of the 
> > vptr as bases and deriveds are constructed:
> >
> >   - during construction, set the vptr last and with release 
> semantics
> >
> >   - in every virtual member function call, read the vptr first with
> >     acquire semantics
> >
> > In a race, you can still have another thread see the vptr 
> as pointing 
> > to a base vtable while trying to call a derived member function.
> >
> > Thoughts?
> >
> > Herb
> >
> >
> >
> > --
> > cpp-threads mailing list
> > cpp-threads at decadentplace.org.uk
> > http://www.decadentplace.org.uk/cgi-bin/mailman/listinfo/cpp-threads
> 
> --
> cpp-threads mailing list
> cpp-threads at decadentplace.org.uk
> http://www.decadentplace.org.uk/cgi-bin/mailman/listinfo/cpp-threads
>