[cpp-threads] Yet another visibility question

Thu Dec 28 12:52:48 GMT 2006

Hans, I agree that having dependence-based ordering doesn't really work at
this level. Even though hardware implementations usually have those kind of
semantics (PPC certainly has both control-flow and data-flow dependence
ordering), they do not have to face the issue of having compiler
reorganizing the control flow and creating/eliminating dependences.
Preventing the compiler from performing such transformations will disable
many useful transformations.

I think that we can come up with a model that does not have
dependence-based ordering. Actually, it seems to me that simply proscribing
write speculation would prevent the generation of out-of-thin-air values.

About the refcount question from Peter on 11/22, if we do not rely on
dependence ordering, we need to have an explicit acquire operation as we
reach the refcount of 0. That can be done with an ordered fetch_and_add,
but then each bump of the refcount gets unnecessarily strong semantics,
while the acquire semantics are only needed on the last one. This is an
argument in favor of having an explicit acquire fence: the acquire behavior
on the fetch_and_add is only needed on the final bump. That would provide
the optimal implementation on PPC.

What I am proposing, then, is:

// x y initially zero

// threads 1 to N:

assert( x == 0 );

if( fetchadd_release( &y, 1 ) == N - 1 )
{
    acquire_fence(); // prevent subsequent stores to x from moving before
any previous loads
    x = 1;
}

Note that a explicit fence is something easy for the compiler to handle. It
solves the issue you discuss below, as the compiler is now able to
differentiate between the two cases:

r1 = x.load_raw();
if (r1) {
     acquire_fence();   // subsequent stores must wait for the previous
load to return
     y = 1;
}
y = 1;

r1 = x.load_raw();
y = 1;                  // this store can be executed ahead of the load
if (r1) {
     acquire_fence();
     y = 1;
}

Furthermore, in the first case the compiler is allowed to apply traditional
optimizations to determine that the first store is redundant, and leave:

r1 = x.load_raw();
if (r1) {
     acquire_fence();   // subsequent stores must wait for the previous
load to return
}
y = 1;                  // this store still cannot be moved ahead by the
compiler
                          // and can only be moved ahead by the hw if (!r1)

I understand that some people do not like explicit fences and see them as
overly complicated, but I personally believe that having explicit fences is
easier for the programmer to reason with than having implied behaviors that
are not visible in the source.

--
Raúl E. Silvera         IBM Toronto Lab   Team Lead, Toronto Portable
Optimizer (TPO)
Tel: 905-413-4188 T/L: 969-4188           Fax: 905-413-4854
D2/KC9/8200/MKM

cpp-threads-bounces at decadentplace.org.uk wrote on 12/20/2006 07:48:51 PM:

> So, could we say (in non-standardese) that if,
>
> 1) in an execution E in which x.load_raw() returns x1 in thread t1, the
> nth subsequent store to y in t1 stores y1 to object o1, and
>
> there exists an x2 such that
>
> 2) in no execution E' in which x.load_raw returns x2, but which agrees
> with E on everything that happens-before or is sequenced-before the
> load_raw, does the nth store to y store y1 to o1
>
> then the nth subsequent store to y in t1 is dependent on the x.load_raw?
>
> (This is starting to look a bit like the Java formulation ...)
>
> Actually, I think that doesn't quite work.
>
> In the example
>
> r1 = x.load_raw();
> if (r1) {
>      y = 1;
> }
> y = 1;
>
> It would declare the second store to y as dependent.  Somehow we need to
> distinguish this example from
>
> r1 = x.load_raw();
> y = 1;
> if (r1) {
>      y = 1;
> }
>
> which should allow the first store to become visible before the
> load_raw.  But this again seems fundamentally broken, since we've now
> associated different behavior with program pieces that a normal
> optimizer would consider equivalent, and which can be separated into a
> different compilation unit from the load_raw().
>
> If y is an ordinary variable, it might not matter, because both would
> result in a race, as Peter points out.  But if we use atomic raw stores,
> and we have per-variable TSO, I think they are observably different.
> (If we use fetch_add_acquire instead of assignment, they clearly should
> be different.)
>
> So I'm still not sure how we progress along these lines.
>
> > -----Original Message-----
> > From: cpp-threads-bounces at decadentplace.org.uk
> > [mailto:cpp-threads-bounces at decadentplace.org.uk] On Behalf
> > Of Peter Dimov
> > Sent: Wednesday, December 20, 2006 6:16 AM
> > To: C++ threads standardisation
> > Subject: Re: [cpp-threads] Yet another visibility question
> >
> > Boehm, Hans wrote:
> >
> > > If I have
> > >
> > > if (x.load_raw()) {
> > >     y = 1;
> > > }
> > > y = 1;
> > >
> > > Is there a dependence?
> >
> > To answer the question: yes, there is a dependence. Different
> > values of x can cause one of the two executions
> >
> > y = 1;
> >
> > or
> >
> > y = 1;
> > y = 1;
> >
> > to take place. So the first store to y is dependent on the x
> > load. This variation is more subtle:
> >
> > r1 = x.load_raw();
> >
> > if( r1 )
> > {
> >     y = 1;
> > }
> >
> > if( !r1 )
> > {
> >     y = 1;
> > }
> >
> > assuming that the two ifs can be hidden away in function
> > calls. The store to y is not dependent.
> >
> > The refcounting example doesn't have this problem; the
> > actions that are taken when the count drops to zero are
> > provably dependent since the result of the atomic is not
> > available to other portions of the code and can't cancel the
> > dependence.
> That's not completely clear to me.  If one of the zero refcount actions
> is to clear a field in the object, and that same field happens to be
> cleared in the other case, could the compiler combine them and move them
> up to before the decrement?
>
> The argument would have to rely on other intervening synchronization
> operations, I think.
>
> > This relies on atomic loads being somewhat "volatile"
> > as far as data flow analysis is concerned, though. That is,
> > it relies on:
> >
> > if( x.load_raw() )
> > {
> >     y = 1;
> > }
> >
> > if( !x.load_raw() )
> > {
> >     y = 1;
> > }
> >
> > not being equivalent to the above example, even if the
> > compiler can prove that x is not modified during the
> > execution of this snippet.
> >
> > All this seems more trouble than it's worth. :-)
> Does that mean that you agree we should forget about dependency-based
> ordering and I should get back to figuring out how to prohibit just "out
> of thin air" results?
>
> >
> >
> > --
> > cpp-threads mailing list
> > cpp-threads at decadentplace.org.uk
> > http://www.decadentplace.org.uk/cgi-bin/mailman/listinfo/cpp-threads
> >
>
> --
> cpp-threads mailing list
> cpp-threads at decadentplace.org.uk
> http://www.decadentplace.org.uk/cgi-bin/mailman/listinfo/cpp-threads