Possible language changes
Boehm, Hans
hans.boehm at hp.com
Wed Mar 2 19:07:38 GMT 2005
Atomicity is an interesting issue I had originally forgotten about.
- What happens if you store to a 32-bit volatile pointer on machines
with a 16-bit memory bus? I think the Java answer is:
- If this is a uniprocessor, use your favorite uniprocessor atomicity
technique (probably some flavor of restartable atomic sections),
which can be pretty cheap.
- If this is a multiprocessor, get better hardware.
Does this work here?
What do volatile bit-fields mean? I don't think the standards say?
I suspect that making it erroneous would break a lot of existing code?
Leaving them basically implementation-defined is probably OK.
Hans
-----Original Message-----
From: Maged Michael [mailto:magedm at us.ibm.com]
Sent: Wednesday, March 02, 2005 10:48 AM
To: Doug Lea
Cc: Andrei Alexandrescu; asharji at plg.uwaterloo.ca; Ben
Hutchings; Doug Lea; Boehm, Hans; Jim Rogers; Kevlin Henney; Bill Pugh;
Richard Bilson; Douglas C. Schmidt
Subject: Re: Possible language changes
I agree that atomic is a desirable approach if volatile will
have strong semantics. I assume that access to variables through atomic
will prevent the compiler from introducing extraneous loads and stores
of these variables in the same function. One issue is alignment. The
compiler should align non-volatile variables in a way consistent with
atomic memory access.
Maged
Doug Lea <dl at cs.oswego.edu>
03/02/2005 11:31 AM
To
Maged Michael/Watson/IBM at IBMUS
cc
Doug Lea <dl at altair.cs.oswego.edu>, Andrei Alexandrescu
<andrei at metalanguage.com>, asharji at plg.uwaterloo.ca, Ben Hutchings
<ben at decadentplace.org.uk>, "Boehm, Hans" <hans.boehm at hp.com>, Jim
Rogers <jimmaureenrogers at att.net>, Kevlin Henney <kevlin at curbralan.com>,
Bill Pugh <pugh at cs.umd.edu>, Richard Bilson <rcbilson at uwaterloo.ca>,
"Douglas C. Schmidt" <schmidt at dre.vanderbilt.edu>
Subject
Re: Possible language changes
Maged Michael wrote:
>
> I am not sure how successful compilers can be in removing
unnecessary
> barriers. Will they be able to generate code with no more
barriers than
> ideal assembly.
Of course not. No optimizer can guarantee to optimize any kind
of code
to ideal assembly. But they can usually do much better than most
programmers. There haven't been PLDI-ish papers out yet
exploring
barrier optimizations but you figure there will be. For example,
hotspot only does StoreLoad-squashing within basic blocks (which
gets most of them in practice), because I didn't know how to do
this
as a full dataflow. Hopefully some of the people I've suggested
pursue this will do so.
In any case, as I said, if you do need full control, the atomics
classes should give you everything you need. And all of the
responsibilities for getting it right!
>
> About DCL, I quote the following from the JSR 133 FAQ:
>
http://www.cs.umd.edu/users/pugh/java/memoryModel/jsr-133-faq.html#dcl
>
> "However, for fans of double-checked locking (and we really
hope there
> are none left), the news is still not good. The whole point of
> double-checked locking was to avoid the performance overhead
of
> synchronization. Not only has brief synchronization gotten a
LOT less
> expensive since the Java 1.0 days, but under the new memory
model, the
> performance cost of using volatile goes up, almost to the
level of the
> cost of synchronization. So there's /still /no good reason to
use
> double-checked-locking."
>
> I don't know if this is an accurate assessment of the cost of
volatile
> or not after taking compiler optimizations (removing barriers)
into
> account. In any case, my opinion is that C++ should allow at
least
> simple things like DCL to be as efficient as being written in
assembly.
> Do we agree on that?
>
I probably ought to ask Jeremy and Brian to rewrite that.
Here's what really happens for
class Singleton {
static private volatile Singleton instance;
public Singleton get() {
Singleton s = instance;
return (s != null)? s : init();
}
private Singleton init() {
// Use either locks or CAS ...
}
}
Here, the volatile read in get() costs only lost compiler
reorderings
compared to a non-volatile read. Which is essential here no
matter
how you do it.
If you do the initialization with a CAS rather than a lock,
then you pay at least one CAS, maybe more if you need to retry.
(CAS approach in general only works when initialization has
no side effects).
If you do it with a lock, you need a CAS for lock, plus
the volatile write (generating a StoreLoad barrier) plus the
unlock,
which normally requires another CAS or barrier. The StoreLoad
associated
with volatile followed by either CAS or barrier is one of the
"easy"
cases, so the JVM will always elide it out.
The net effect is as fast as I know how to do this in assembler.
Compared to full locking, DCL saves you two expensive
instructions
(CAS or StoreLoad) per call to get(). So will be noticeably
cheaper
than using locks if this code is exercised much. (Still, there
are
many cases in Java where approaches like the dynamically loaded
static trick are faster.)
All of this is true at least on x86 and sparc.
On IA64 and PPC there are different optimizations that apply
here
that I don't think anyone has done out yet inside JVMs. (I sure
hope the IBM folks are working on this though.)
Anyway, for C++, I think the only issue here is whether it is
worth weakening semantics of volatile writes compared to java
so that IA64 can ALWAYS use st.rel rather than mf rather than
only doing so as an optimization. As Hans and I discussed (a
year
or two ago) the optimizations are hard to apply, yet the actual
need for a full mf is rare. I think there are similar issues for
PPC but I've stopped pretending I know anything about PPC
barriers
any more because people keep telling me different alleged facts
about them.
-Doug
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://shadbolt.decadentplace.org.uk/pipermail/cpp-threads/attachments/20050302/7286cfec/attachment.htm
More information about the cpp-threads
mailing list