pre-LilleHammer mailing deadline

Sat Mar 5 19:13:14 GMT 2005

Bill Pugh wrote:
> Sorry to be sitting on the sidelines so long.
> 
> I read through the document, and things are in pretty good shape.
> 
> I thought the section on "Do we allow data races" needed some work, so 
> I provide
> a bit of rewrite of that attached. I wanted to really set up the impact 
> on
> this decision, which I think was a little muddled before. I also 
> changed the section
> title to be "Do we define the semantics of Data Races?", since we don't 
> really have
> a choice on whether or not to allow them, just as to whether or not we 
> define them.
> 
> This rewrite is pretty raw, and I'll do another read through and try to 
> clean it up,
> but I wanted to get feedback ASAP since it is a substantial rewrite.
> 

Looks great to me, modulo touchups.

> The section on volatile doesn't really set out all the issues. It seems 
> to pose the problem
> as having Java semantics for volatiles, or no thread semantics for 
> volatiles at all. There
> are some other options, but perhaps we don't want to spell those out. 
> I've left the
> section on volatiles unchanged.

I'm not so sure we want to go into these details, although my mention
of Dekker's algorithm sorta implies we should. Here's something
that we could somehow adapt if necessary. (This is mostly from
memory of JMM discussions so could be wrong, although I did recheck
IA64 ref manual vol 2, page 387+  --
http://developer.intel.com/design/itanium/manuals/iiasdmanual.htm)

The main question here is whether a write to a volatile must
always entail a full StoreLoad (as in Java)(*), or whether it could be
done with what on IA64 is a "Release" (st.rel). This shows up in
Dekker's algorithm, A Release is only good with respect to an "Acquire"
(ld.acq) on the SAME variable (modulo quirks).
But Dekker's algorithm includes code of the form:

Thread 1:  write A;  read  B
Thread 2:  write B;  read  A

So, if we allowed weaker version, programmers need to somehow know that
they need to resort to the atomics library to implement this, and know
to manually use a heavier barrier.

The choice is harder than it looks ...
   1. It only impacts platforms for which Release is cheaper
      than StoreLoad.
   2. The majority of code using volatiles would work fine
      with Release. In particular, nearly all uses of double-check.
   3. Many programmers relying on read-after-write guarantees
      WILL know enough to use atomics library.
   4. The analysis needed to weaken StoreLoad to Release in those
      cases where it would be OK is tricky,
      requiring good alias analysis among other things, so is not
      something you'd like to effectively mandate.
      (Aside: On the other hand, detecting ONLY double-check
      would probably get 90% of the potential speedup.)
   5. IA64 has comparatively fast StoreLoad (mf) (compared to
      p4/Xeon and EM64T anyway), so not doing this optimization
      is not a huge loss.
   6. Performance impact on PPC remains unknown to me, since I still
      don't know the optimal forms of things like double-check that
      apply to the various versions of of PPCs.

For Java, we chose to keep the usage rules as simple
as we could, so used strong version.

(*) Actually, this all assumes that you choose to place the
StoreLoad barriers after writes rather than before reads. This
is almost always the best way, but there are a couple of cases
where doing the opposite could win.

-Doug