volatile and barriers (especially on PPC)

Fri Mar 11 12:11:57 GMT 2005

I wrote...

> 
> I'm not so sure we want to go into these details, although my mention
> of Dekker's algorithm sorta implies we should. Here's something
> that we could somehow adapt if necessary. (This is mostly from
> memory of JMM discussions so could be wrong, although I did recheck
> IA64 ref manual vol 2, page 387+  --
> http://developer.intel.com/design/itanium/manuals/iiasdmanual.htm)
> 
> 
> The main question here is whether a write to a volatile must
> always entail a full StoreLoad (as in Java)(*), or whether it could be
> done with what on IA64 is a "Release" (st.rel). This shows up in
> Dekker's algorithm, A Release is only good with respect to an "Acquire"
> (ld.acq) on the SAME variable (modulo quirks).
> But Dekker's algorithm includes code of the form:
> 
> Thread 1:  write A;  read  B
> Thread 2:  write B;  read  A
> 
> So, if we allowed weaker version, programmers need to somehow know that
> they need to resort to the atomics library to implement this, and know
> to manually use a heavier barrier.
> 
> The choice is harder than it looks ...
>    1. It only impacts platforms for which Release is cheaper
>       than StoreLoad.
>    2. The majority of code using volatiles would work fine
>       with Release. In particular, nearly all uses of double-check.
>    3. Many programmers relying on read-after-write guarantees
>       WILL know enough to use atomics library.
>    4. The analysis needed to weaken StoreLoad to Release in those
>       cases where it would be OK is tricky,
>       requiring good alias analysis among other things, so is not
>       something you'd like to effectively mandate.
>       (Aside: On the other hand, detecting ONLY double-check
>       would probably get 90% of the potential speedup.)
>    5. IA64 has comparatively fast StoreLoad (mf) (compared to
>       p4/Xeon and EM64T anyway), so not doing this optimization
>       is not a huge loss.
>    6. Performance impact on PPC remains unknown to me, since I still
>       don't know the optimal forms of things like double-check that
>       apply to the various versions of of PPCs.
> 
> For Java, we chose to keep the usage rules as simple
> as we could, so used strong version.
> 
> (*) Actually, this all assumes that you choose to place the
> StoreLoad barriers after writes rather than before reads. This
> is almost always the best way, but there are a couple of cases
> where doing the opposite could win.
> 

Maged: Can you do me/us a favor and explain how
these issues impact various versions of PowerPC? One way
to do it would be to show the best code sequences for some common
constructions. And whether they could be generated automatically
by compilers assuming various semantics for volatiles, also
assuming existence of the simple optimizations I
described for Java and/or others along those lines.

Thanks!

-Doug