volatile and barriers (especially on PPC)

Doug Lea dl at cs.oswego.edu
Fri Mar 11 17:59:43 GMT 2005



> So, considering PPC, it would be preferable not to pay the price of 
> unnecessary StoreLoad barriers (which may be double the cost of
> release on Power5) that are difficult for the compiler to remove
> safely.

I think the important questions here are:

1. If volatile had strong semantics, how often would they
    be stronger than actually required in an application?

2. Of those in (1) how many can be weakened using known optimization
    techniques?

3. Of those remaining from (2), how many are used
    in constructions where a factor of two in barrier cost
    makes a measurable difference in program performance? (Weight
    this by the fact that on some platforms, all barriers cost
    about the same so there is no saving.)

4. Of those remaining from (3) how many are found in constructions
    that are likely to ever be encountered by non-experts? (Experts
    could instead use the atomics library to micro-optimize.)

The results of these questions lead to a judgement call about which
way makes the most sense to standardize upon. You can tell what
my guesses to the answers to these questions are. But they are
just guesses.


> cycles. So far I haven't got my hands on a Power5, but I have been
> told that on Power5 lwsync takes about 30 cycles and sync takes about
> 60 cycles. However, these numbers may be too low because they were
> run on dual processor only and may be expected to be higher on a
> larger machine.
> 

Digression: Happily we seem to nearing the end of the era when the chip
designers thought they could get away with 100-200 cycle barriers and
atomics.(p4/xeon, and, I hear, Intel EM64T are the worst.) Opterons,
sparc-niagara, itanium-2 and power5 are all pretty good, ranging around
20-60 cycles.

-Doug







More information about the cpp-threads mailing list