[cpp-threads] out-of-thin-air results,
depndency-based orderingagain.
Peter Dimov
pdimov at mmltd.net
Fri Feb 16 17:32:11 GMT 2007
Doug Lea wrote:
> Peter Dimov wrote:
>>
>> So load_relaxed is
>> allowed to be delayed more and as a result, it can see a "more
>> current" value than load_acquire, if my understanding is correct. :-)
>>
>
> I should have been more careful saying what you get.
> It's a little messy though...
>
> On any standard MP, any given read will return a value that is
> either the last write by current processor (i.e., a "snooped"
> value) or the last write committed in the global per-variable
> total order.
>
> The proposed difference between load_acquire and load_relaxed is that
> for load_relaxed, you do not have any further promises of the value
> being causally consistent, even wrt the current thread's instruction
> stream. Except that you must retain the minimal guarantee
> that the value is per-variable monotonic (i.e., will not back up
> in the per-variable commit stream.) That's why it is conceivable
> (but very doubtful) that some platforms would need some sort of
> barrier to avoid doing the wrong thing with a sequence of
> load_relaxed's to the same variable. (This form of "self
> consistency" is discussed at http://www.cl.cam.ac.uk/~kaf24/mem.txt).
I think that we have still not pinned down the precise semantics of
_relaxed; it can provide either STRONG or WEAK self consistency depending on
how we narrow down its definition. If we take the example from the paper:
p->x = 7;
a = q->x;
b = p->x;
we can either say that the relaxed reads from p->x and q->x are reorderable
(WEAK), or we can conservatively say that they are only reorderable if the
compiler can prove that they don't alias (STRONG). Both models are _relaxed
since ordinary ops are still movable across a _relaxed load in either
direction, and two provably independent _relaxed loads can still be moved
around.
Either way, the atomic_*_relaxed spec will probably need to say that the
compiler is free to reorder/coalesce/snoop if the hardware is free to
perform the same reordering/coalescing/snooping. What the hardware is
allowed to do will need to be specified by the memory model, where we can
pretend that compiler optimizations are no longer a concern.
More information about the cpp-threads
mailing list