Possible language changes

Wed Mar 2 05:18:38 GMT 2005

First. I apologize for my long silence and lack of input to this 
discussion. I have been extremely busy in recent months, and a couple of 
health scares in my family (that ended well fortunately) didn't help 
either.

>From Hans on 3/1/05:
> From: Doug Lea [mailto:dl at cs.oswego.edu] 
> > 
> > 4) Should volatile reads/writes count as synchronization operations 
> > and have acquire/release semantics as in Java?  I would argue yes. 
> > But this is a significant change on many platforms (though not 
> > Itanium), and is controversial.
> 
> Can anyone think of any compelling objection to this?
> It seems like the only viable approach. It would also be
> good to settle on this soon to get reaction.
> 
I think Maged had some legitimate concerns about this.  It would be
useful to look at real examples.

My concerns are: (1) Fence instructions are expensive, roughly 100 cycles, 
(2) Full acquire/release semantics are not needed in most accesses to 
volatile variables in many useful algorithms. I'll try to give some 
examples below. (3) Once the language gives a strong guarantee, it is 
almost impossible to take it back. So, I prefer giving minimal guarantees.

My preference is to separate memory ordering from volatility and 
atomicity.

Consider the following fragment of a version of the lock-free IBM freelist 
algorithm [IBM S/370, 1983]
The shared variable Head and the Next and Data field of dynamic nodes are 
volatile. For simplicity, forget about the ABA and memory reclamation 
problems (e.g., nodes are reused but not freed, and double-width CAS is 
used for ABA prevention).

Pop(Head)
      do
1        if (h := Head) = null return EMPTY                     // data 
dependence - 2 after 1
2        next := h->Next                                                // 
data dependence - 3 after 1 and 3 after 2
3     until CAS(Head,h,next)                                    // data 
dependence - 4 after 1
4     data := h->Data
4a   ReadBeforeWrites                                           // h will 
not be reused before 4 completes
5     RetireNode(h)
       return data

Push(Head,data)
       node := NewNode()
6     node->Data := data
       do
7          h := Head
8          node->Next := h                                              // 
data dependence - 8 after 7
8a        WritesBeforeWrites                                    // 9 after 
6 and 9 after 8 - WritesBeforeReads is correct too but usually more 
expensive
9     until CAS(Head,h,node)
       return

Please correct me if I missed any fences. Assuming that I didn't, we only 
need two fences (excluding those in RetireNode and NewNode). If needed I 
can give many other examples.

In the above example, I assume that the language implicitly respects the 
order of data dependence. Also, since there appears to be multiple 
definitions of volatile, I need to explain what i mean by that. 

Shared variables that are declared as volatile are guaranteed the 
following (informally):
? Each read in the program of a volatile variable X must result in a load 
of the memory of X. That is, the
compiler is not allowed to omit the load corresponding to a read of a 
volatile variable, by using a value
that was loaded earlier from the variable.
? Each write in the program of a value v to a volatile variable X must 
result in a store of v to the memory
of X. That is, the compiler is not allowed to omit the store corresponding 
to a write to a volatile
variable, by combining the effects of multiple writes into a smaller 
number of stores.
? Rephrasing of the first two points: The compiler is not allowed to omit 
loads and stores corresponding
to reads and writes of volatile variables.
? The compiler is not allowed to introduce extraneous loads and stores of 
volatile variables. For example,
if the program contains the sequence of statements a := X ?. Y := a, where 
X is volatile. The compiler
is not allowed to generate a second load of X and assign the result to Y.

Summary: Exactly one load (or store) for every read (or write) to a 
volatile variable. No omitted loads and
stores. No extraneous loads or stores.

>From Hans on 3/1/05:
I think the generic objection is something like: I already use volatiles
to ensure some weaker property on my hardware.  Adding acquire/release
semantics adds memory barrier overhead on my platform, which I don't
need.  It would be nice to understand the weaker properties that might
be desirable, and to ensure that the atomic operations library can
provide those.

I guess the properties that I need from volatile are: no omitted reads and 
writes and no extraneous reads and writes.

>From Hans on 3/1/05:
It seems to me that at the moment, there is enough platform variation
in the meaning of "volatile" that it's hard to use it in portable code.

This brings up another important and tricky question:  When does
"volatile" imply atomicity?

In most cases atomicity is needed, but there are rare cases were atomicity 
is not required, such as in Lamport's bakery algorithm [CACM 1974] and 
reads and writes to hazard pointers in my hazard pointer memory 
reclamation method [IEEE TPDS 2004]. In these rare cases, reads that are 
concurrent with writes can return any garbage values and the algorithms 
remains correct. 

In general I do not object to implying atomicity and requiring volatile 
variables to be properly aligned for atomic access.

>From Hans on 2/28/05:
2) Can we restrict the compiler introduction of visible writes to 
only the bit-field case?
I would argue "yes".  Otherwise we need an alternate, weaker
restriction.
Note that the way I've phrased this, it disallows speculative register
promotion, which is now done by many compilers.  I know of no way to
avoid that.

3) Should we allow the compiler to reload local variable from globals,
as some compiler do now.  Assuming the answer to (1) is yes, I think the
answer here is implicitly "yes".

In point (2), I assume that you meant allow. I agree. My preference is to 
allow the compiler to reorder, omit and introduce loads and stores of 
non-volatile variables arbitrarily, as long as the result is always 
consistent in a single thread execution.

Maged
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://shadbolt.decadentplace.org.uk/pipermail/cpp-threads/attachments/20050302/948484a1/attachment.html