[cpp-threads] Yet another visibility question
Paul E. McKenney
paulmck at linux.vnet.ibm.com
Thu Dec 21 22:25:36 GMT 2006
On Wed, Dec 20, 2006 at 07:26:51PM -0600, Boehm, Hans wrote:
> > From: Doug Lea
> >
> > Boehm, Hans wrote:
> >
> > > Con (a showstopper, I think): We don't know how to express
> > > dependency-based ordering in a way that's not broken by
> > conventional
> > > optimizations on other compilation units that don't mention
> > atomics,
> > > and remains usable of the store takes place on the far side of an
> > > abstraction boundary.
> > >
> >
> >
> > Of the use cases for raw operations, it seems that the most
> > defensible one is hand-crafted speculation. As in (for an atomic x)
> >
> > int rawx = x.load_raw();
> > int nextx = f(rawx)
> > if (x.cas(rawx, nextx)) return;
> > // else maybe get a lock and do it the slow way
> >
> > Since speculation is at the mercy of whatever happens, you
> > don't expect very much. But you do expect that the load_raw
> > actually reads x, and isn't completely optimized away and
> > replaced by say, 0.
> >
> > So perhaps the place to start is to require that raw loads
> > and stores must at least actually occur (unlike ordinary
> > variables where dead ones can be killed).
> > Except that loads could be killed if it is provable that
> > across all executions of the program, the same value must be read.
> >
> > This sounds sorta like rules for old-style C volatiles?
> >
> > You can probably go a little further by somehow saying that
> > the loads/stores must occur between any surrounding full
> > barriers that might exist. As in, not moving across lock boundaries.
> >
> > How much more do you need?
> >
> The major concern is that the unordered atomic equivalent of
>
> Thread 1: r1 = x; y = 1
> Thread 2: r2 = y; x = 1
>
> should allow r1 = r2 = 1, while
>
> Thread 1: r1 = x; y = r1
> Thread 2: r2 = y; x = r2
>
> should not, since that would admit out-of-thin-air values, which we may
> not want to allow.
>
> As Peter points out, it's not 100% clear that we want to prohibit "out
> of thin air" values. It would still be nice to be able to prove that
> f() below returns 0, no matter what foo does.
>
> int f() {
> int x = 0;
> int *y = foo();
>
> *y = 13;
> return x;
> }
>
> I think currently that's violated only for implementations of foo() that
> exhibit undefined semantics. If we allow out of thin air values, foo()
> could return &x, and legitimate programs could return nonzero from f(),
> which argues that some tools might have to consider the possibility.
> Presumably, none of this could happen in a real implementation, which
> argues that this would be a weird behavior to standardize.
If foo() makes use of primitives similar to gcc's (admittedly nonstandard)
__builtin_frame_address() primitive, then foo() might well be able to
affect the value of x, and thus the value returned from f(). I would
certainly hope that people would avoid doing this sort of thing except
for debugging, but you did ask...
FWIW, here is a quick summary of the ordering approach taken by the
Linux kernel.
There are two types of directives:
o Compiler control, done either through the "barrier()" directive
or via the "memory" option to a gcc asm directive.
o CPU control, via explicit memory-barrier directives:
o smp_mb() and mb() are full memory barriers.
o smp_rmb() and rmb() are memory barriers that
are only guaranteed to segregate reads.
o smp_wmb() and wmb() are memory barriers that
are only guaranteed to segregate writes.
The "smp_" prefix causes code to be generated only in
an SMP build. So, for example, smp_wmb() would be used
to order a pair of writes making up some sort of
synchronization primitive, while wmb() would be used
to order MMIO writes in device drivers.
This distinction is due to the fact that a single CPU
will always see its one cached references to normal
memory in program order, while reordering can happen
(even on a single-CPU system) from the viewpoint of
an I/O device.
(Note that this is not a complete list, but covers
the most commonly used primitives.)
The CPU-control directives imply compiler-control directives, since
there is little point in forcing ordering at the CPU if one has not
also forced ordering in the compiler. However, it -is- useful to
force ordering at the compiler only, for example, when writing code
that interacts with an interrupt handler.
Most uses of both the compiler- and SMP-CPU-control directives are
"buried" in higher-level primitives. For example:
Primitive Uses
--------- ----
barrier(): 564
smp_mb(): 171
smp_rmb(): 87
smp_wmb(): 150
The unconditional CPU-control directives are used more heavily,
due to the fact that the many hundreds of device drivers in the
Linux kernel must make use of them:
Primitive Uses
--------- ----
mb(): 2480
rmb(): 168
wmb(): 587
These numbers might seem large, but there are more than 40,000 instances
of locking primitives in the Linux kernel, -not- counting cases where
a locking primitive use is "wrappered" in a small function or macro.
Applications that don't interact with either interrupts or signal
handlers might have less need for the raw CPU-control primitives.
Thanx, Paul
More information about the cpp-threads
mailing list