volatile and barriers (especially on PPC)

Maged Michael magedm at us.ibm.com
Fri Mar 11 22:35:11 GMT 2005


Option (1) is as in Java: volatile has strong ordering semantics and if 
the programmer needs more control s/he can use atomics.

Option (2) is (unless the programmer always uses locks) s/he must specify 
volatility, atomicity and memory ordering using (yet-to-be-defined) 
language constructs.

Rationale:
a) if the average programmer needs to access concurrent data outside 
locks, s/he will use volatile.
b) if volatile has memory ordering semantics, in most cases these 
semantics are stronger than intended by the programmer.
c) (I guess) compilers will not be able to optimize many unnecessary 
memory barriers
d) either the programmer will accept the lost performance, or
e) s/he cares about performance and will opt out of using volatile 
sometimes
f) s/he will use atomic_ops, and so will have to understand memory 
ordering any way.

I understand that many on this list consider option (2) to be too 
complicated for the average programmer and I (grudgingly) accept that :-)

Maged





"Boehm, Hans" <hans.boehm at hp.com> 
03/11/2005 05:00 PM

To
Maged Michael/Watson/IBM at IBMUS, "Doug Lea" <dl at cs.oswego.edu>
cc
"Andrei Alexandrescu" <andrei at metalanguage.com>, 
<asharji at plg.uwaterloo.ca>, "Ben Hutchings" <ben at decadentplace.org.uk>, 
<clark.nelson at intel.com>, "Doug Lea" <dl at altair.cs.oswego.edu>, 
<jimmaureenrogers at att.net>, "Kevlin Henney" <kevlin at curbralan.com>, 
<pabuhr at plg.uwaterloo.ca>, "Bill Pugh" <pugh at cs.umd.edu>, "Richard Bilson" 
<rcbilson at uwaterloo.ca>
Subject
RE: volatile and barriers (especially on PPC)






If we assume that we leave the semantics of races undefined,
then I think option (1) can be stated as something like

A program contains a data race if any sequentially
consistent execution allows two simultaneous accesses,
one of which is a write, to a variable, field, or
sequence of bit-fields, and if at least one of the
accesses is non-volatile.  The semantics of programs
with data races are undefined.  All other programs have
sequentially consistent semantics.

That definitely has simplicity going for it.

My feeling is that currently neither Maged's option (2)
below, or my option (b) (release/acquire only, no sequential
consistency for volatile) are acceptable, because we
(or at least I) can't give a specification of what they
mean that's anywhere near as simple.

I'm not really sure what option (2) means at all.  It
should include

1) atomicity, in some cases (we have that issue with all
the options, I think), and

2) some notion that written values become visible
"promptly", i.e. are not left in registers, or cached in registers
on the reader side.  But I think
that in order to define "promptly", we have to either
talk about ordering or fairness.  Based on long discussions
in the JSR133 context, I think we don't want to do the
latter.  Which brings us back to ordering, and also might
help to explain why the current notion of "volatile" feels
like magic.

Thus I think we should

a) Think about whether there are better ways to specify the
alternatives.

b) Go with Maged's option (1b) if we fail.  Or maybe even if we
succeed.

I'm not convinced that we should worry a lot about subtle lock-free
algorithms in designing the "volatile" semantics.  Anyone writing
such code will want to use a library with finer control anyway.

Hans


-----Original Message-----
From: Maged Michael [mailto:magedm at us.ibm.com] 
Sent: Friday, March 11, 2005 12:59 PM
To: Doug Lea
Cc: Andrei Alexandrescu; asharji at plg.uwaterloo.ca; Ben Hutchings;
clark.nelson at intel.com; Doug Lea; Boehm, Hans; jimmaureenrogers at att.net;
Kevlin Henney; pabuhr at plg.uwaterloo.ca; Bill Pugh; Richard Bilson
Subject: Re: volatile and barriers (especially on PPC)

So I support either options (1) or (2): 

(1) (a) Accesses to volatiles are completely ordered as in Java. 
    (b) There are standard libraries (e.g. atomic_ops) , i.e., stack of
razor blades :-) that allow the programmer to write code with less
barrier 

(2) (a) Volatility is separate from ordering. 
    (b) Define constructs that allow the programmer to specify ordering
accurately, again the stack of razor blades. 

(3) Something in between. 

However, there is a possibility that the future ordinary programmer will
have to be savvy enough to handle a stack of sharp razors any way. I am
not a compiler expert, so I don't know if given programs written using
(1)(a) or (3) how successful the compiler can be in generating code as
fast (or almost as fast) as code generated for programs written using
(1)(b) or (2).  If the compiler is not successful, then there is the
possibility that the ordinary programmer might abandon volatile and use
only (1)(b) to seek performance.  Also, I hope we don't overestimate the
ability of compilers by only taking into account the few examples like
DCL that are widely used today while ignoring emerging algorithms that
may be popular in the future, in particular non-blocking transactions
and data structures. We need to look at enough examples to reach an
answer that takes that into account. My guess (again I am not compilers
expert) is that there will be many unnecessary barriers left unoptimized
in implementations of many efficient non-blocking algorithms. 



The following is from the java.util.concurrent ConcurrentLinkedQueue
class. I just numbered the memory accesses in the critical path. We need
to have the following ordering, assuming no speculative writes. 

1 before 2 
1 before 3 
2 before 4 
3 before 4 
3 before 6 


    public E poll() { 
        for (;;) { 
 1           Node<E> h = head; 
 2           Node<E> t = tail; 
 3           Node<E> first = h.getNext(); 
 4           if (h == head) { 
                if (h == t) { 
                    if (first == null) 
                        return null; 
                    else 
                        casTail(t, first); 
 5               } else if (casHead(h, first)) { 
 6                   E item = first.getItem(); 
                    if (item != null) { 
 7                       first.setItem(null); 
                        return item; 
                    } 
                    // else skip over deleted item, continue loop, 
                } 
            } 
        } 
    } 

If volatile doesn't have any memory ordering semantics and if we have
hypothetical memory barriers LoadLoad, LoadStore, StoreStore and
StoreLoad. We need LoadLoad between 1 and 2, and between 3 and 4. 
That is 2 LoadLoad barriers. 

If volatile has full barrier semantics, the compiler will have to assume
that the following ordering is required (in the critical path): 

1 before 2                                LoadLoad 
2 before 3                                 LoadLoad 
3 before 4                                 LoadLoad 
4 before 5                                 LoadLoad 
5 before 6 and 5 before 7        StoreStore/StoreLoad    (I don't think
LoadLoad/LoadStore is sufficient)* 
6 before returning                LoadLoad/LoadStore 
7 before returning                StoreStore/StoreLoad 

* CAS has both read and write semantics but I think that the compiler
will assume that the programmer intended for the write of the CAS to be
completed before 6 and 7 are initiated. 

Also even if we consider a simple example as DCL, if the compiler
doesn't know from where the function containing DCL was called, wouldn't
the compiler have to add a LoadLoad/StoreLoad barrier before the first
read in DCL? So, I am not sure that compilers can handle even DCL in
general, but they can handle it well if it is inlined. 

-Maged 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://shadbolt.decadentplace.org.uk/pipermail/cpp-threads/attachments/20050311/ba208120/attachment.html


More information about the cpp-threads mailing list