[cpp-threads] Yet another visibility question

Wed Jan 10 21:02:46 GMT 2007

Let me see if I can summarize where I think we might stand.  (I'm a little
confused ...)

A number of us think that high-level atomics, ideally with sequentially
consistent semantics and similar to Java volatiles, are highly desirable,
since they are the closest to actually being usable by mere mortals,
and give tolerable performance for things like double-checked locking.
(I think there was, and still is, a fairly clear concensus supporting this
on the C++ committee.)

I would really prefer a model in which programs that use high-level
atomics to synchronize, and have no races on ordinary variables, behave
sequentially consistently.  (I think the hard part of this on the
implementation side is actually getting sequential consistency for
programs that use only atomics.)  This implies that high-level atomics
have acquire/release semantics (in addition to satisfying other
requirements).

At least Herb doesn't think anything at a lower level is useful.
But my impression from past committee meetings is that most people also
want a lower layer.  The alternative would be to try to capture
all the interesting use cases in more specialized libraries instead.

I still think any lower layer should be based primarily on acquire/release
ordering constraints rather than explicit fences.  This gives us some
consistency with the high layer, reasonable performance on X86, etc.
I'm not sure whether anyone meant to disagree with this.  In any case,
I don't think I've seen anything like a complete proposal that doesn't
go this route.

I think we should, as much as we possibly can, avoid saying anything
about dependencies.  I still have no idea how to define them.  And all
our past attempts impose nasty nonlocal optimization constraints.

I think we agree that there are cases in which you can get better
code by allowing explicit fences, mostly because we currently have no
other way of expressing conditional ordering constraints.  If we do
add explicit fences, the SPARC model seems to be a reasonable one
to follow, though the

LoadStore | LoadLoad

and

StoreStore | LoadStore

(essentially acquire and release replacement) variants are probably the
most interesting, and the real goal may be to be able to generate a
PowerPC-lwsync-like fence, which is essentially the union of these two.

(Does the combined fence run appreciably slower on SPARC than the
two variants?)

If we get to the question of adding explicit fences, my impression is that
opinions are strongly divided.  But I think that is essentially a
complexity vs. expressivity and performance trade-off.

Hans