[cpp-threads] OpenMP Memory Model

Sat Apr 15 01:45:23 BST 2006

> From: Bronis R. de Supinski [mailto:bronis at llnl.gov] 
> 
> Hans:
> 
> First a couple general comments. You mentioned a proposed C++ 
> memory model. Is there a draft or something available for review?
> I would be very interested in looking at it if possible. 
> Also, I think we have generated interest in looking at the 
> OpenMP memory model as part of the OpenMP 3.0 specification 
> process. If that does happen, would you be interested in 
> participating in the discussions? If so, let me know and I 
> will let you know when they are scheduled.
I'd be interested.  Unfortunately, a previous attempt to arrange
something along these lines failed for scheduling reasons.
> 
> Next, I agree with Greg's responses. I have a couple 
> additional comments/questions below as well.
> 
> [snip]
> 
> > > 4) The notion of atomic operations seems quite different, in that
> > >
> > >   (a) There is no need to label reads of atomic variables 
> specially.  
> > > An ordinary read that races with an atomic write is OK.  I don't 
> > > think this is a good design decision, but it seems to be 
> the way OpenMP works.
> 
> Indeed it is. I agree it is odd. If you have any examples for 
> which this creates difficulty, they would be useful.
I would also expect it to cause problems with something like atomic
updates to 64-bit longs on 32-bit X86 hardware.  You'd have to make a
lot of reads of longs atomic, which is expensive.  I'd expect most
implementations to cheat and do the wrong thing.

...
> Interesting. Strictly speaking, it is not clear whether the 
> OpenMP 2.5 specification memory model requires this "reads 
> kill" behavior. Rather, Greg and I felt it would be useful 
> and seemed to conform to the intent of the model. Any 
> additional information you could provide on whether it proves 
> difficult to work without it or why it was hard to get anyone 
> to implement it correctly would be useful.
> 
I believe the "reads kill" behavior was one of the issues that motivated
the revision of the Java memory model (JSR133).  Bill Pugh and Jeremy
Manson will be able to supply more details.  This was described in some
of their earlier papers.  I found an old slide set of Bill's at
http://www.cs.umd.edu/~pugh/java/memoryModel/multithreadedHandout.ps
which suggests about a 10% performance cost for Java, if I read it
correctly.  My intuition would be that you should expect more in an
OpenMP like setting.

There is a similar issue with C volatiles on Itanium.  The Itanium
software conventions specify that volatile loads should generate
hardware "acquire loads".  That arguably doesn't make much sense unless
they also kill available expressions for CSE.  But out of the three
compilers I have any experience with, only one implements that piece.
As in the Java case, I suspect that this is largely because the issue
wasn't well understood.

Hans