Possible language changes
Maged Michael
magedm at us.ibm.com
Wed Mar 2 05:18:38 GMT 2005
First. I apologize for my long silence and lack of input to this
discussion. I have been extremely busy in recent months, and a couple of
health scares in my family (that ended well fortunately) didn't help
either.
>From Hans on 3/1/05:
> From: Doug Lea [mailto:dl at cs.oswego.edu]
> >
> > 4) Should volatile reads/writes count as synchronization operations
> > and have acquire/release semantics as in Java? I would argue yes.
> > But this is a significant change on many platforms (though not
> > Itanium), and is controversial.
>
> Can anyone think of any compelling objection to this?
> It seems like the only viable approach. It would also be
> good to settle on this soon to get reaction.
>
I think Maged had some legitimate concerns about this. It would be
useful to look at real examples.
My concerns are: (1) Fence instructions are expensive, roughly 100 cycles,
(2) Full acquire/release semantics are not needed in most accesses to
volatile variables in many useful algorithms. I'll try to give some
examples below. (3) Once the language gives a strong guarantee, it is
almost impossible to take it back. So, I prefer giving minimal guarantees.
My preference is to separate memory ordering from volatility and
atomicity.
Consider the following fragment of a version of the lock-free IBM freelist
algorithm [IBM S/370, 1983]
The shared variable Head and the Next and Data field of dynamic nodes are
volatile. For simplicity, forget about the ABA and memory reclamation
problems (e.g., nodes are reused but not freed, and double-width CAS is
used for ABA prevention).
Pop(Head)
do
1 if (h := Head) = null return EMPTY // data
dependence - 2 after 1
2 next := h->Next //
data dependence - 3 after 1 and 3 after 2
3 until CAS(Head,h,next) // data
dependence - 4 after 1
4 data := h->Data
4a ReadBeforeWrites // h will
not be reused before 4 completes
5 RetireNode(h)
return data
Push(Head,data)
node := NewNode()
6 node->Data := data
do
7 h := Head
8 node->Next := h //
data dependence - 8 after 7
8a WritesBeforeWrites // 9 after
6 and 9 after 8 - WritesBeforeReads is correct too but usually more
expensive
9 until CAS(Head,h,node)
return
Please correct me if I missed any fences. Assuming that I didn't, we only
need two fences (excluding those in RetireNode and NewNode). If needed I
can give many other examples.
In the above example, I assume that the language implicitly respects the
order of data dependence. Also, since there appears to be multiple
definitions of volatile, I need to explain what i mean by that.
Shared variables that are declared as volatile are guaranteed the
following (informally):
? Each read in the program of a volatile variable X must result in a load
of the memory of X. That is, the
compiler is not allowed to omit the load corresponding to a read of a
volatile variable, by using a value
that was loaded earlier from the variable.
? Each write in the program of a value v to a volatile variable X must
result in a store of v to the memory
of X. That is, the compiler is not allowed to omit the store corresponding
to a write to a volatile
variable, by combining the effects of multiple writes into a smaller
number of stores.
? Rephrasing of the first two points: The compiler is not allowed to omit
loads and stores corresponding
to reads and writes of volatile variables.
? The compiler is not allowed to introduce extraneous loads and stores of
volatile variables. For example,
if the program contains the sequence of statements a := X ?. Y := a, where
X is volatile. The compiler
is not allowed to generate a second load of X and assign the result to Y.
Summary: Exactly one load (or store) for every read (or write) to a
volatile variable. No omitted loads and
stores. No extraneous loads or stores.
>From Hans on 3/1/05:
I think the generic objection is something like: I already use volatiles
to ensure some weaker property on my hardware. Adding acquire/release
semantics adds memory barrier overhead on my platform, which I don't
need. It would be nice to understand the weaker properties that might
be desirable, and to ensure that the atomic operations library can
provide those.
I guess the properties that I need from volatile are: no omitted reads and
writes and no extraneous reads and writes.
>From Hans on 3/1/05:
It seems to me that at the moment, there is enough platform variation
in the meaning of "volatile" that it's hard to use it in portable code.
This brings up another important and tricky question: When does
"volatile" imply atomicity?
In most cases atomicity is needed, but there are rare cases were atomicity
is not required, such as in Lamport's bakery algorithm [CACM 1974] and
reads and writes to hazard pointers in my hazard pointer memory
reclamation method [IEEE TPDS 2004]. In these rare cases, reads that are
concurrent with writes can return any garbage values and the algorithms
remains correct.
In general I do not object to implying atomicity and requiring volatile
variables to be properly aligned for atomic access.
>From Hans on 2/28/05:
2) Can we restrict the compiler introduction of visible writes to
only the bit-field case?
I would argue "yes". Otherwise we need an alternate, weaker
restriction.
Note that the way I've phrased this, it disallows speculative register
promotion, which is now done by many compilers. I know of no way to
avoid that.
3) Should we allow the compiler to reload local variable from globals,
as some compiler do now. Assuming the answer to (1) is yes, I think the
answer here is implicitly "yes".
In point (2), I assume that you meant allow. I agree. My preference is to
allow the compiler to reorder, omit and introduce loads and stores of
non-volatile variables arbitrarily, as long as the result is always
consistent in a single thread execution.
Maged
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://shadbolt.decadentplace.org.uk/pipermail/cpp-threads/attachments/20050302/948484a1/attachment.html
More information about the cpp-threads
mailing list