Possible language changes

Wed Mar 2 19:07:38 GMT 2005

Atomicity is an interesting issue I had originally forgotten about.

- What happens if you store to a 32-bit volatile pointer on machines
with a 16-bit memory bus?  I think the Java answer is:

- If this is a uniprocessor, use your favorite uniprocessor atomicity
technique (probably some flavor of restartable atomic sections),
which can be pretty cheap.

- If this is a multiprocessor, get better hardware.

Does this work here?

What do volatile bit-fields mean?  I don't think the standards say?
I suspect that making it erroneous would break a lot of existing code?
Leaving them basically implementation-defined is probably OK.

Hans

	-----Original Message-----
	From: Maged Michael [mailto:magedm at us.ibm.com] 
	Sent: Wednesday, March 02, 2005 10:48 AM
	To: Doug Lea
	Cc: Andrei Alexandrescu; asharji at plg.uwaterloo.ca; Ben
Hutchings; Doug Lea; Boehm, Hans; Jim Rogers; Kevlin Henney; Bill Pugh;
Richard Bilson; Douglas C. Schmidt
	Subject: Re: Possible language changes

	I agree that atomic is a desirable approach if volatile will
have strong semantics. I assume that access to variables through atomic
will prevent the compiler from introducing extraneous loads and stores
of these variables in the same function. One issue is alignment. The
compiler should align non-volatile variables in a way consistent with
atomic memory access. 

	Maged 

Doug Lea <dl at cs.oswego.edu> 

03/02/2005 11:31 AM 

To
Maged Michael/Watson/IBM at IBMUS 
cc
Doug Lea <dl at altair.cs.oswego.edu>, Andrei Alexandrescu
<andrei at metalanguage.com>, asharji at plg.uwaterloo.ca, Ben Hutchings
<ben at decadentplace.org.uk>, "Boehm, Hans" <hans.boehm at hp.com>, Jim
Rogers <jimmaureenrogers at att.net>, Kevlin Henney <kevlin at curbralan.com>,
Bill Pugh <pugh at cs.umd.edu>, Richard Bilson <rcbilson at uwaterloo.ca>,
"Douglas C. Schmidt" <schmidt at dre.vanderbilt.edu> 
Subject
Re: Possible language changes

	Maged Michael wrote:
	> 
	> I am not sure how successful compilers can be in removing
unnecessary 
	> barriers. Will they be able to generate code with no more
barriers than 
	> ideal assembly.

	Of course not. No optimizer can guarantee to optimize any kind
of code
	to ideal assembly. But they can usually do much better than most
	programmers. There haven't been PLDI-ish papers out yet
exploring
	barrier optimizations but you figure there will be. For example,
	hotspot only does StoreLoad-squashing within basic blocks (which
	gets most of them in practice), because I didn't know how to do
this
	as a full dataflow. Hopefully some of the people I've suggested
	pursue this will do so.

	In any case, as I said, if you do need full control, the atomics
	classes should give you everything you need. And all of the
	responsibilities for getting it right!

	> 
	> About DCL, I quote the following from the JSR 133 FAQ:
	>
http://www.cs.umd.edu/users/pugh/java/memoryModel/jsr-133-faq.html#dcl
	> 
	> "However, for fans of double-checked locking (and we really
hope there 
	> are none left), the news is still not good. The whole point of

	> double-checked locking was to avoid the performance overhead
of 
	> synchronization. Not only has brief synchronization gotten a
LOT less 
	> expensive since the Java 1.0 days, but under the new memory
model, the 
	> performance cost of using volatile goes up, almost to the
level of the 
	> cost of synchronization. So there's /still /no good reason to
use 
	> double-checked-locking."
	> 
	> I don't know if this is an accurate assessment of the cost of
volatile 
	> or not after taking compiler optimizations (removing barriers)
into 
	> account. In any case, my opinion is that C++ should allow at
least 
	> simple things like DCL to be as efficient as being written in
assembly. 
	> Do we agree on that?
	> 

	I probably ought to ask Jeremy and Brian to rewrite that.
	Here's what really happens for

	class Singleton {
	   static private volatile Singleton instance;
	   public Singleton get() {
	     Singleton s = instance;
	     return (s != null)? s : init();
	   }
	   private Singleton init() {
	      // Use either locks or CAS ...
	   }
	}

	Here, the volatile read in get() costs only lost compiler
reorderings
	compared to a non-volatile read. Which is essential here no
matter
	how you do it.

	If you do the initialization with a CAS rather than a lock,
	then you pay at least one CAS, maybe more if you need to retry.
	(CAS approach in general only works when initialization has
	no side effects).

	If you do it with a lock, you need a CAS for lock, plus
	the volatile write (generating a StoreLoad barrier) plus the
unlock,
	which normally requires another CAS or barrier. The StoreLoad
associated
	with volatile followed by either CAS or barrier is one of the
"easy"
	cases, so the JVM will always elide it out.

	The net effect is as fast as I know how to do this in assembler.
	Compared to full locking, DCL saves you two expensive
instructions
	(CAS or StoreLoad) per call to get(). So will be noticeably
cheaper
	than using locks if this code is exercised much. (Still, there
are
	many cases in Java where approaches like the dynamically loaded
	static trick are faster.)

	All of this is true at least on x86 and sparc.

	On IA64 and PPC there are different optimizations that apply
here
	that I don't think anyone has done out yet inside JVMs. (I sure
	hope the IBM folks are working on this though.)

	Anyway, for C++, I think the only issue here is whether it is
	worth weakening semantics of volatile writes compared to java
	so that IA64 can ALWAYS use st.rel rather than mf rather than
	only doing so as an optimization. As Hans and I discussed (a
year
	or two ago) the optimizations are hard to apply, yet the actual
	need for a full mf is rare. I think there are similar issues for
	PPC but I've stopped pretending I know anything about PPC
barriers
	any more because people keep telling me different alleged facts
	about them.

	-Doug

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://shadbolt.decadentplace.org.uk/pipermail/cpp-threads/attachments/20050302/7286cfec/attachment.htm