[cpp-threads] Causality on more than two processors, write atomicity

Fri Sep 30 08:45:57 BST 2005

On 9/30/05, Hans Boehm <Hans.Boehm at hp.com> wrote:
> I was just rereading this thread, and I can't quite make sense out of some
> things.  It seems to me that if Peter's example were Java code, and all
> variables were volatile, by the usual guidelines we have been giving
> everyone, no barriers are needed on X86, since there are no stores
> followed by loads, right?

Well, in theory... (see "distinguishing write operations that must appear
atomic...")

http://research.compaq.com/wrl/people/kourosh/papers/1995_thesis.pdf

---
Explicit fence instructions are an alternative to operation labels. The
most common use of fence instructions is to communicate the significance
of specific program orders among instructions that occur before and after
the fence. One way to classify a fence instruction is by the set of
previous operations and the set of future operations that are related by
the fence [GLL+90]. For example, the set of future operations maybe: (a)
all future read and write operations (full fence); (b) all future write
operations (write fence), (c) all future read operations (read fence),
and (d) only the operation immediately following the fence (immediate
fence). Likewise, the set of previous operations may be a combination of
specific read and write operations that precede the fence in program order.
For example, for supporting PL1, an immediate fence may be used to capture
the program order from previous read and write operations to a competing
write. A more conservative way to convey this program order is to use a
write fence before the competing write, which has the side effect of also
imposing an order to the writes following the competing write; the use of
conservative fences obviously leads to a loss of information. In
implementations that do not directly support explicit fence instructions,
achieving the functionality of a fence may require issuing a sequence of
other instructions that end up enforcing the desired operation ordering.23

The use of fences as described above is not suitable for distinguishing
write operations that must appear atomic with respect to multiple copies.
There are several alternatives, however. The simplest is to provide atomic
behavior for all writes if a model requires it for any write. Alternatively,
the third category chains can be supported through the second technique
described earlier in this section, whereby for some reads, the program
order from the read to a following operation is enforced by delaying the
latter operation for both the read to complete and for the write whose
value is read to complete with respect to all processors; two flavors of
fences may be used, with one imposing the more conservative ordering.
Finally, a different mechanism may be used to identify the writes that must
appear atomic. Explicit fence instructions can theoretically be used to
directly provide equivalent information to operation labels. For example, a
fence instruction may be used to logically identify the proceeding memory
operation as a competing operation. 24 However, we are not aware of any
systems that use fence instructions in this manner.
---

>
> But operations on Java volatiles are totally ordered,

I'm now not sure about that. Looks like they are stronger than PC (due to
implied store-load fencing) but weaker than TSO (due to lack of remote
write atomicity).

> so Y != 0 in P1 should be impossible, right?

Uhmm. Peter's example was about (im)possibility for P3 to observe Y == 0
after it observed Z == 1.

regards,
alexander.