[cpp-threads] A question about N2153

Raul Silvera rauls at ca.ibm.com
Thu Jan 18 15:42:26 GMT 2007


cpp-threads-bounces at decadentplace.org.uk wrote on 01/17/2007 01:44:00 PM:

> What is the reference implementation of acquire_fence for x86, SPARC RMO,

> PowerPC, IA-64? I'm guessing (no op), #LoadLoad | #LoadStore, lwsync, mf.
>
> If we take the example:
>
> if( fetchadd_release( &ref_count, -1 ) == 1 ) // old value
> {
>     acquire_fence();
>     destroy_object();
> }
>
> this is fine on x86/SPARC. On PowerPC, however, the most efficient
> implementation has an isync instead of lwsync, right? On IA-64 an ld.acq
> from ref_count may also be more efficient than a mf.
>
> Since a load_acquire( location ) on PowerPC is (pseudocode):
>
> mov r1, location
> cmp r1, r1
> bne- $
> isync
>
> (correct? :-) ) it seems to me that a more efficient formulation of the
> example would be:
>
> if( fetchadd_release( &ref_count, -1 ) == 1 ) // old value
> {
>     load_acquire( &ref_count );
>     destroy_object();
> }
>
> This is not guaranteed to work in theory, but I think that it will work
in
> practice on all implementations (and the extra load can be optimized out
on
> non-IA-64). If I'm right, this might pose a problem. It's not good if
> something not backed by the official specification works better in
practice.
>

When we move from a load_acquire to an acquire fence, there is clearly a
loss of precision, since we're moving from ordering a single load to
ordering all preceding loads.

What N2153 is trying to do is to have a step towards higher granularity
from the original ISOMM proposal, which only had a fully ordered fence.
There are two degrees of granularity for fences: One is which types of
accesses it affects (loads vs stores) and the other is which variables it
affects.

We have tried to increase granularity of fences in terms of loads vs
stores, but it could be argued that N2153 doesn't go far enough, and that
increasing granularity in terms of which variables are affected is also
desirable.

In this case, what you really want is an acquire fence that applies only to
ref_count. That would be better than a load_acquire, because it doesn't
imply a reload of the variable.

>
> --
> cpp-threads mailing list
> cpp-threads at decadentplace.org.uk
> http://www.decadentplace.org.uk/cgi-bin/mailman/listinfo/cpp-threads




More information about the cpp-threads mailing list