[cpp-threads] out-of-thin-air results, depndency-based orderingagain.

Fri Feb 16 17:32:11 GMT 2007

Doug Lea wrote:
> Peter Dimov wrote:
>>
>> So load_relaxed is
>> allowed to be delayed more and as a result, it can see a "more
>> current" value than load_acquire, if my understanding is correct. :-)
>>
>
> I should have been more careful saying what you get.
> It's a little messy though...
>
> On any standard MP, any given read will return a value that is
> either the last write by current processor (i.e., a "snooped"
> value) or the last write committed in the global per-variable
> total order.
>
> The proposed difference between load_acquire and load_relaxed is that
> for load_relaxed, you do not have any further promises of the value
> being causally consistent, even wrt the current thread's instruction
> stream. Except that you must retain the minimal guarantee
> that the value is per-variable monotonic (i.e., will not back up
> in the per-variable commit stream.) That's why it is conceivable
> (but very doubtful) that some platforms would need some sort of
> barrier to avoid doing the wrong thing with a sequence of
> load_relaxed's to the same variable. (This form of "self
> consistency" is discussed at http://www.cl.cam.ac.uk/~kaf24/mem.txt).

I think that we have still not pinned down the precise semantics of 
_relaxed; it can provide either STRONG or WEAK self consistency depending on 
how we narrow down its definition. If we take the example from the paper:

    p->x = 7;
    a = q->x;
    b = p->x;

we can either say that the relaxed reads from p->x and q->x are reorderable 
(WEAK), or we can conservatively say that they are only reorderable if the 
compiler can prove that they don't alias (STRONG). Both models are _relaxed 
since ordinary ops are still movable across a _relaxed load in either 
direction, and two provably independent _relaxed loads can still be moved 
around.

Either way, the atomic_*_relaxed spec will probably need to say that the 
compiler is free to reorder/coalesce/snoop if the hardware is free to 
perform the same reordering/coalescing/snooping. What the hardware is 
allowed to do will need to be specified by the memory model, where we can 
pretend that compiler optimizations are no longer a concern.