[cpp-threads] Web site updated

Peter Dimov pdimov at mmltd.net
Sun Feb 18 15:22:37 GMT 2007


Hans Boehm wrote:
> I'm not sure I've completely followed this discussion, but this seems
> to be diverging quite a bit from N2052, in ways that I would not like
> to see.  Certainly acquire and release were NOT defined
> in terms of implicit fences.  One reason for that is to make it
> possible to optimize out for example, thread-local synchronization.
> An assignment to a dead atomic variable with store_release has
> no ordering implications on anything else.  If the
> definition was based on fences, it would be very hard to eliminate the
> implied fence.

I think that you're following the same path that I did before realizing that 
the example is more subtle than that. In this formulation:

a==b==cnt==0

[1] atomic_store_relaxed( &a, 1 );

[2] atomic_fetchadd_release( &cnt, r );
[3] atomic_fetchadd_release( &cnt, -r );

[4] atomic_store_relaxed( &b, 1 );

it is indeed true that the _release does not prevent the store to b from 
migrating above the store to a, even though an implementation of _release in 
terms of bidirectional fences will preclude that.

But in Raul Silvera's example, the last store is to cnt. And this store, 
even in its _relaxed variety, cannot cross another store to cnt. That is why 
it cannot migrate above the store to a.

In practice, this means that in:

[1] atomic_store_relaxed( &a, 1 );

[2] atomic_fetchadd_release( &cnt, r );
[3] atomic_fetchadd_release( &cnt, -r );

[4] // code in other translation unit, not visible to the optimizer

[23] cannot be optimized out without leaving at least a fence behind, 
because the optimizer has to be conservative and assume that [4] may contain 
a relaxed store to cnt.

> Our current definition of "ordered" needs work, since we don't yet
> have agreement on what it should mean.  (We have been playing with
> some variants that do enforce ordering with respect to threads that do
> not access the same atomic variable.  But I think that was probably
> a mistake.)

I was about to write a post that is a summary of my current understanding of 
the various possible definitions of ordered, and this seems as good place as 
any. So:

1. Acquire+release

This definition of ordered applies only to read-modify-write operations. It 
states that the read part has acquire semantics, and that the write part has 
release semantics. Its advantage is that it imposes no more synchronization 
than necessary for a significant percentage of the cases.

2. Leading/trailing fence

This definition is the moral equivalent of a relaxed operation preceded and 
followed by a full fence. Its downside is that it is not clear whether 
_ordered operations obey CCCC, SC, or something else. It's also not clear 
whether this is the most efficient way to get CCCC.

3. _ordered == CCCC (in which case it may be spelled _cccc)

Not clear how one could get SC.

4. _ordered == SC (possibly spelled _sc or _consistent?)

Not clear how one could get CCCC.

5. store_ordered+load_ordered == SC, store_ordered+load_acquire == CCCC

Not clear whether this specifies what the hardware does.




More information about the cpp-threads mailing list