[cpp-threads] OpenMP Memory Model
Greg Bronevetsky
greg at bronevetsky.com
Fri Apr 14 21:46:58 BST 2006
> 1) The OpenMP spec ignores things like bit-fields, and the related
> issues of when one write may require reading and rewriting other
> objects. That may not matter much in the OpenMP context.
>
I'm unclear on how C++ deals with bitfields but OpenMP doesn't worry
about them explicitly. The general rule is that racing writes cause the
variable in question to become undefined. When the writes are both to a
bitfield, then its unclear how many bits become undefined. The same
applies to large objects. If there are two racing writes to a given
field of a class, then does the whole class become garbage? The latter
is addressed in OpenMP by the fact that memory locations are referred to
as "variables". Bitfield treadment, however, is unclear and should be
clarified in the future.
> 2) It assumes that compilers cannot introduce new writes (e.g. as part
> of speculative register promotion). We agree that's the proper rule,
> but I suspect it's no more true for current implementations than it is
> for C++.
>
The memory model in the paper does not make this assumption. It only
provides execution rules for the writes that actually appear in the
application's source code. This can be considered to be the "API
treatment" of OpenMP. The programmer wrote a program in C++/Fortran and
OpenMP. They use a number of functions of OpenMP's API, including reads,
writes, locks, barriers, etc. The goal of the formal model is to specify
to rules for how those API calls will operate. The underlying
implementation (the compiler, for example) may do fancy things,
including making additional calls to the API (such as adding writes) but
as long as the API calls performed by the application are performed
correctly, everything is fine.
As such, the formal spec in the paper only addresses how OpenMP
responds to the application's source code and nothing else. Note that
the spec in the paper does place some limitations on the compiler in
that it forces the compiler not to violate read-write dependencies
between accesses to different variables (ex: in A=5; B=A; the write to B
depends on the read to A) but that is the only restriction and it may
not survive to the final spec.
> 3) The paper doesn't mention volatile. I looked at the OpenMP 2.5 spec
> again. That does give semantics to volatile, though the semantics don't
> look plausible to me. (The implicit flush operation seems to be on the
> wrong side, at least when comparing it to Java volatile semantics. And
> putting it at the previous/next sequence point seems to me to allow some
> really weird code, at the expense of optimization.) The C++ memory
> model may or may not mention volatile, but I'm pretty sure it will not
> use the OpenMP semantics.
>
I haven't seen Java's volatile semantics but I don't understand why
the flushes are on the wrong side. OpenMP's model for volatile borrows
its intuition from release consistency. In RC you write something and
then perform a release operation, which blocks until all prior
operations have completed. If you read something you precede the read
with an acquire operation, which completes before any subsequent
operations may begin. Acquires and releases are totally ordered relative
to each other. OpenMP's flush is a combination of both acquire and
release and so the pattern for volatile variables is the same as above
but with a flush operation being used in both cases.
Note that if a variable has to be flushed immediately before a read
and immediately after a write, this implies that the read cannot come
from a register and the write cannot be to a register. Thus, OpenMP's
semantics seem to essentially capture the original single-threaded
meaning of volatile.
Can you be specific about what optimizations become invalid given
these semantics? Also, what kinds of alternate semantics for volatile
have you been thinking of?
> 4) The notion of atomic operations seems quite different, in that
>
> (a) There is no need to label reads of atomic variables specially. An
> ordinary read that races with an atomic write is OK. I don't think this
> is a good design decision, but it seems to be the way OpenMP works.
>
> (b) Atomic operations are what we would call "unordered". Given that
> the corresponding reads are not special and hence unordered, this
> probably doesn't lose anything.
>
> I expect the resulting notion of atomics usually requires lots of
> explicit "flush"es, which I think are expensive to implement on
> architectures like X86 or IA64, since I think they require StoreLoad
> ordering.
>
You are correct. The idea behind atomic operations is twofold. There
is the obvious bit about doing a read and a write as one atomic
operation. Another reason is that regular writes are not allowed to race
with each other or with reads, which makes any kind of synchronization
via variables impossible. Atomic writes are allowed to race with each
other and all reads.
However, I can see how this can limit optimizations, as shown in
your common subexpression elimination example below.
> 5) Is the notion of eclipsing reads designed to prevent variables from
> "flip flopping", as in
>
> Thread 1:
> x = 1;
> flush1;
> x = 2;
>
> Thread 2:
> flush2;
> r1 = x;
> r2 = x;
>
> outcome: flush1 Oflsh flush2 && r1 = 2 && r2 = 1?
>
I don't understand the details of your example. If flush1 precedes
flush2 in the Flush Order then thread 1's write to x races with thread
2's reads of x. I think that x=2 on thread 1 needs to be atomic in order
for this to work as above.
> If so, I think that would get us back to the "reads kill" problem in the
> original Java memory model, which wasn't a good idea, mostly because it
> seemed to be too expensive to get anyone to bother to implement
> correctly. The issue arises because in real cases, the reads in thread
> 2 may look more like:
>
> r0 = *y/17;
> r1 = *x;
> r2 = *y/17;
>
> With this rule, you can no longer eliminate the common subexpressions
> unless you know that *x and *y don't alias.
>
You are correct. If there are a number of racing atomic writes then
it is possible for the first read of *y to get one value while
subsequent reads get another. This is actually a good example of why
atomic reads should be identified as such since if we knew that *y may
only be written to by regular writes, common subexpression elimination
would be legal because the only way to break it would be via a race
between the write and one of the reads, which would produce garbage data
anyhow.
> 6) If a read is involved in a race, the value associated with the read
> is undefined, but it does not render the meaning of the entire program
> undefined. I think that's problematic for C++, in that a race can
> result in a read of an uninitialized vtable pointer, which can normally
> result in a wild branch, and hence any behavior whatsoever.
> Implementing anything else often requires some sort of memory fence
> after the vtable initialization.
>
I don't understand whether you're commenting on reads in general or
reads to key internal C++ data structures. If the latter, then it is
outside the scope of OpenMP since the vtable is not directly visible to
the application (caveat: its been years since I've last touched C++ so I
may be wrong on this) and is thus not explicitly part of the contract
between the application and C++ or OpenMP. You are correct that if an
implementation of C++ were written on top of OpenMP then it would have
this issue and you correctly identify how this issue would be resolved.
In particular, a barrier immediately following the initialization of
basic C++ data structures would work nicely (barriers include flushes).
In general I think that its key to separate the specification from
the application's point of view and any kind of underlying details that
have to exist in reality but are invisible to the application itself.
Only the former needs to be explicitly specified while the latter needs
to be considered from the performance point of view but not actually
specified.
--
Greg Bronevetsky
More information about the cpp-threads
mailing list