[cpp-threads] Yet another visibility question

Sun Dec 31 02:12:26 GMT 2006

On Thu, 28 Dec 2006, Raul Silvera wrote:

> I think that we can come up with a model that does not have
> dependence-based ordering. Actually, it seems to me that simply proscribing
> write speculation would prevent the generation of out-of-thin-air values.
I think that prohibiting out-of-thin-air values is more or less equivalent
to prohibiting write speculation.  The problem is that both are difficult
to define with any precision.

For the rest of this, it was already pointed out that "acquire fences"
don't really make sense.  If you substitute a fence that orders at
least loads followed by stores and loads followed by loads, then
I agree that that can generate better code on some platforms in this case.
But in this case, I'd also be inclined to agree with Herb, that it would
be much better to handle this at a higher level of abstraction if we
can figure out how to do that.  The odds of someone getting reference
counting right and this close to optimal seem very small.  And supporting
sufficiently fine-grained fencing is clearly messy.

Hans

>
> About the refcount question from Peter on 11/22, if we do not rely on
> dependence ordering, we need to have an explicit acquire operation as we
> reach the refcount of 0. That can be done with an ordered fetch_and_add,
> but then each bump of the refcount gets unnecessarily strong semantics,
> while the acquire semantics are only needed on the last one. This is an
> argument in favor of having an explicit acquire fence: the acquire behavior
> on the fetch_and_add is only needed on the final bump. That would provide
> the optimal implementation on PPC.
>
> What I am proposing, then, is:
>
> // x y initially zero
>
> // threads 1 to N:
>
> assert( x == 0 );
>
> if( fetchadd_release( &y, 1 ) == N - 1 )
> {
>     acquire_fence(); // prevent subsequent stores to x from moving before
> any previous loads
>     x = 1;
> }
>
> Note that a explicit fence is something easy for the compiler to handle. It
> solves the issue you discuss below, as the compiler is now able to
> differentiate between the two cases:
>
> r1 = x.load_raw();
> if (r1) {
>      acquire_fence();   // subsequent stores must wait for the previous
> load to return
>      y = 1;
> }
> y = 1;
>
> r1 = x.load_raw();
> y = 1;                  // this store can be executed ahead of the load
> if (r1) {
>      acquire_fence();
>      y = 1;
> }
>
> Furthermore, in the first case the compiler is allowed to apply traditional
> optimizations to determine that the first store is redundant, and leave:
>
> r1 = x.load_raw();
> if (r1) {
>      acquire_fence();   // subsequent stores must wait for the previous
> load to return
> }
> y = 1;                  // this store still cannot be moved ahead by the
> compiler
>                           // and can only be moved ahead by the hw if (!r1)
>
> I understand that some people do not like explicit fences and see them as
> overly complicated, but I personally believe that having explicit fences is
> easier for the programmer to reason with than having implied behaviors that
> are not visible in the source.
>
> --
> Raúl E. Silvera         IBM Toronto Lab   Team Lead, Toronto Portable
> Optimizer (TPO)
> Tel: 905-413-4188 T/L: 969-4188           Fax: 905-413-4854
> D2/KC9/8200/MKM
>
>
> cpp-threads-bounces at decadentplace.org.uk wrote on 12/20/2006 07:48:51 PM:
>
> > So, could we say (in non-standardese) that if,
> >
> > 1) in an execution E in which x.load_raw() returns x1 in thread t1, the
> > nth subsequent store to y in t1 stores y1 to object o1, and
> >
> > there exists an x2 such that
> >
> > 2) in no execution E' in which x.load_raw returns x2, but which agrees
> > with E on everything that happens-before or is sequenced-before the
> > load_raw, does the nth store to y store y1 to o1
> >
> > then the nth subsequent store to y in t1 is dependent on the x.load_raw?
> >
> > (This is starting to look a bit like the Java formulation ...)
> >
> > Actually, I think that doesn't quite work.
> >
> > In the example
> >
> > r1 = x.load_raw();
> > if (r1) {
> >      y = 1;
> > }
> > y = 1;
> >
> > It would declare the second store to y as dependent.  Somehow we need to
> > distinguish this example from
> >
> > r1 = x.load_raw();
> > y = 1;
> > if (r1) {
> >      y = 1;
> > }
> >
> > which should allow the first store to become visible before the
> > load_raw.  But this again seems fundamentally broken, since we've now
> > associated different behavior with program pieces that a normal
> > optimizer would consider equivalent, and which can be separated into a
> > different compilation unit from the load_raw().
> >
> > If y is an ordinary variable, it might not matter, because both would
> > result in a race, as Peter points out.  But if we use atomic raw stores,
> > and we have per-variable TSO, I think they are observably different.
> > (If we use fetch_add_acquire instead of assignment, they clearly should
> > be different.)
> >
> > So I'm still not sure how we progress along these lines.
> >
> > > -----Original Message-----
> > > From: cpp-threads-bounces at decadentplace.org.uk
> > > [mailto:cpp-threads-bounces at decadentplace.org.uk] On Behalf
> > > Of Peter Dimov
> > > Sent: Wednesday, December 20, 2006 6:16 AM
> > > To: C++ threads standardisation
> > > Subject: Re: [cpp-threads] Yet another visibility question
> > >
> > > Boehm, Hans wrote:
> > >
> > > > If I have
> > > >
> > > > if (x.load_raw()) {
> > > >     y = 1;
> > > > }
> > > > y = 1;
> > > >
> > > > Is there a dependence?
> > >
> > > To answer the question: yes, there is a dependence. Different
> > > values of x can cause one of the two executions
> > >
> > > y = 1;
> > >
> > > or
> > >
> > > y = 1;
> > > y = 1;
> > >
> > > to take place. So the first store to y is dependent on the x
> > > load. This variation is more subtle:
> > >
> > > r1 = x.load_raw();
> > >
> > > if( r1 )
> > > {
> > >     y = 1;
> > > }
> > >
> > > if( !r1 )
> > > {
> > >     y = 1;
> > > }
> > >
> > > assuming that the two ifs can be hidden away in function
> > > calls. The store to y is not dependent.
> > >
> > > The refcounting example doesn't have this problem; the
> > > actions that are taken when the count drops to zero are
> > > provably dependent since the result of the atomic is not
> > > available to other portions of the code and can't cancel the
> > > dependence.
> > That's not completely clear to me.  If one of the zero refcount actions
> > is to clear a field in the object, and that same field happens to be
> > cleared in the other case, could the compiler combine them and move them
> > up to before the decrement?
> >
> > The argument would have to rely on other intervening synchronization
> > operations, I think.
> >
> > > This relies on atomic loads being somewhat "volatile"
> > > as far as data flow analysis is concerned, though. That is,
> > > it relies on:
> > >
> > > if( x.load_raw() )
> > > {
> > >     y = 1;
> > > }
> > >
> > > if( !x.load_raw() )
> > > {
> > >     y = 1;
> > > }
> > >
> > > not being equivalent to the above example, even if the
> > > compiler can prove that x is not modified during the
> > > execution of this snippet.
> > >
> > > All this seems more trouble than it's worth. :-)
> > Does that mean that you agree we should forget about dependency-based
> > ordering and I should get back to figuring out how to prohibit just "out
> > of thin air" results?
> >
> > >
> > >
> > > --
> > > cpp-threads mailing list
> > > cpp-threads at decadentplace.org.uk
> > > http://www.decadentplace.org.uk/cgi-bin/mailman/listinfo/cpp-threads
> > >
> >
> > --
> > cpp-threads mailing list
> > cpp-threads at decadentplace.org.uk
> > http://www.decadentplace.org.uk/cgi-bin/mailman/listinfo/cpp-threads
>
>
> --
> cpp-threads mailing list
> cpp-threads at decadentplace.org.uk
> http://www.decadentplace.org.uk/cgi-bin/mailman/listinfo/cpp-threads
>