[cpp-threads] Prism 0.9.1, and further on the effects of races

Tue Sep 12 18:47:08 BST 2006

Two things:

  - a quick update on a new Prism draft
    (thanks for all the comments so far!)

  - a big question looming since the Redmond ad-hoc

First, Prism: There is now an updated draft of the Prism spec published in the current ISO C++ mailing (pre-Portland; we're meeting in Portland, OR on Oct 15-20). It is version 0.9.1 (Sep 8, 2006), PDF available here: http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2006/n2075.pdf .

Second, the question: As you'll see on the cover page of the Prism doc above, the sentiment about "what to guarantee during races" is virtually identical internally at Microsoft and externally at the Redmond ad-hoc meeting we just had: People like Option 4, which boils down to "go ahead and optimize as long as you don't move stuff out of a critical section and don't invent writes to locations that wouldn't be written to in an SC execution." Cool, and fine.

Now the "what guarantees in races" question has I think been narrowed down to just this one question:

  In a race on variable x, what is undefined/unspecified:

  (a) The value of x (only, and whatever the program then does
      with a bogus value)?

  (b) The entire program (global undefined behavior)?

I really, really, really hate (b). I think that if we decide on (b), we've punted and failed in an important place.

But Hans points out that (a) isn't easy. Clearly any race on a pointer to function could cause a wild branch -- and that's okay so far, because explicit pointers to functions are covered by (a) and the programmer can see the pointer and reason about what it means to follow a corrupt pointer, and deserves what we gets. No issue with that, if the pointer to function appears in the program's source code (including inside a function object).

The problem, however, is that a race can occur on the construction of an object and another thread can see an invalid vptr/vtable. The vptr and vtable are not visible to the programmer, who doesn't see the vptr or vtables in the source code. To the programmer, he's simply racing on a pointer to object, here global_ptr:

  thread 1: global_ptr = new T();

  thread 2: global_ptr->Foo();

PRINCIPLE: I feel very strongly that, since we (the language and the compiler) are the ones who are creating this hidden vptr/vtable machinery under the covers, it ought to be our responsibility to get the vptr and vtable right, so that users don't get globally undefined behavior and potentially wild branches in code like the above.

I'm currently trying to figure out how to prevent this. Here are two options. Option 1 works (I think) but penalizes all pointers the user ever wrote, which is unfortunate, and arguably at the wrong level (which is why it's such a heavy a hammer). Option 2 has unanswered questions but I think it's the right approach because it says we're taking responsibility for generating vptr and vtables correctly. Put another way, it feels to me like Option 1 is the proverbial "pound of cure," and Option 2 (if it can be made to work) is the "ounce of prevention."

Option 1 is to implicitly treat all pointers as "volatile" (acquire/release semantics). I can hear the howls of protest, and there would be a performance impact, but I don't know if anyone knows how much the impact really would be because I don't know if anyone has measured it. I conjecture without proof that it might be less in C++ than in environments like JVM and CLI that use pointers more heavily [insert ironic comment here about how they claim not to have pointers]. But I think this would "cure" the whole problem, at some cost to be determined.

Option 2, which unfortunately I can't yet completely see how to make work in C++, is to have special treatment for vptrs. (I think conceptually this is the correct thing to do -- after all, we're creating the vptr machinery, and it should be our job to make it work right!) Note two cases where the problem arises (if there are more let me know, these are the ones I can imagine):

  (A) When the vtable is created, the writes to the vtable slots get
      reordered to perform after taking the address of the vtable
      (giving someone a vptr), so that someone can see a vptr to a
      not-fully-set vtable.

I think we can deal with (A) by making the vptr "volatile" (acq/rel semantics) and making sure any vptr to a vtable is set after the vtable's own entries by the thread that first creates and publishes the vtable (this should require no changes to existing code which should be doing it this way already). Frankly, even if we can't solve (B), I think this part is probably necessary.

HARD PROBLEM: But the harder problem is:

  (B) The vptr points to a base vtable, not yet the fully derived vtable,
      and the program tries to access a derived slot offset beyond the
      end of the base vptr. This issue stems from the fact that vptrs
      are written multiple times.

Can you think of any way to deal with (B), e.g., by making vptrs written only once at the end of a full object's construction? Note that this set-the-vptr-once seems like it would work trivially in Java/CLI two-pass construction with deep virtual calls during construction/destruction, but I don't see how to make it work in C++. Is there a way to deal with (1) short of adding an "isConstructed" bit to every (sub)object which blows binary compatibility (and probably some size and speed bloat)?

Note that the following strategy does not work (thanks to Lawrence and Hans for pointing this out), because of C++'s multiple settings of the vptr as bases and deriveds are constructed:

  - during construction, set the vptr last and with release semantics

  - in every virtual member function call, read the vptr first with
    acquire semantics

In a race, you can still have another thread see the vptr as pointing to a base vtable while trying to call a derived member function.

Thoughts?

Herb