[cpp-threads] out-of-thin-air results, depndency-based ordering again.

Doug Lea dl at cs.oswego.edu
Fri Feb 16 13:24:24 GMT 2007


Have you considered the following alternate plan of attack?

The underlying desire seems to be that load_relaxed
be implemented exactly as load_acquire except that there
is no machine-level fence issued. Perhaps a better name
for this would be "load_speculative" since the effects
rely on the state of coherence mechanics etc, which might
be thought of as a random process -- sometimes
you will see the most current value, sometimes not.
Any programmer using such a load must ensure that
the value actually is deterministic or is otherwise
useful, due to some proof special to algorithm being
implemented, that compilers and hardware are unaware of.

This is very different than pure as-if-serial semantics
of raw variables (which admit all of the dataflow
optimizations that don't make sense here), and is also
different than guaranteed preservation of happens-before edges
you get with the stronger orderings.

The implementation consequences of this approach seem
to meet the intentions of recent posts. For example,
a compiler may only infer a value for a load_relaxed if it
could do so for a load_acquire (which is almost never --
in practice perhaps only when global analysis shows that
the value never changes).  And a load_relaxed could be
optimized away only if its value is never used. And so on.
(It's an open question whether some platforms
would still need some sort of "light" fence in some
cases to avoid some reorderings.)

A similar story holds for store_relaxed.

All together, this seems relatively easy to understand
and implement. But maybe not too easy to spec out -- on the
face of it, the specs would need to reflect the underlying
possible non-determinism. This is made harder by the fact
that on nearly all machines, many common cases actually
are deterministic (like indirect loads), and some programmers
would probably like to rely on this fact. But I can't get
myself to think that this is an important issue. If someone
needs an acquiring indirect load, then they should state
it that way, and rely on compiler (or maybe just clever
macros) to elide fence on machines for which it is safe
to do so on indirections. For all the attention given in this
spec to working around what compilers already do, it seems
only fair to ask them to now also do this trivial optimization.

-Doug





More information about the cpp-threads mailing list