[cpp-threads] Yet another visibility question

Wed Jan 3 01:33:26 GMT 2007

> For the rest of this, it was already pointed out that "acquire fences"
> don't really make sense.  If you substitute a fence that orders at
> least loads followed by stores and loads followed by loads, then
> I agree that that can generate better code on some platforms in this
case.
> But in this case, I'd also be inclined to agree with Herb, that it would
> be much better to handle this at a higher level of abstraction if we
> can figure out how to do that.  The odds of someone getting reference
> counting right and this close to optimal seem very small.  And supporting
> sufficiently fine-grained fencing is clearly messy.
>

Right, Hans... The type of fence I had in mind is basically the one you are
describing.

The fundamental point I was trying to make, though, is not about explicit
fencing, but about the possibility of defining a model without
dependence-based ordering, given the problems it introduces with respect to
compiler optimization. I would like to know what is the group's position on
such an approach.

If you have such a model, where all ordering must be explicit, then it
makes sense to have fine-grained fencing. I'm not sure at what level you
consider fencing messy -- are you concerned about the mathematical
formulation of the model? A lot of the current practice on parallel
programming does rely on explicit fences, and they have been successfully
employed on many parallel applications; a great example of this is the
Linux kernel.

> Hans
>
> >
> > About the refcount question from Peter on 11/22, if we do not rely on
> > dependence ordering, we need to have an explicit acquire operation as
we
> > reach the refcount of 0. That can be done with an ordered
fetch_and_add,
> > but then each bump of the refcount gets unnecessarily strong semantics,
> > while the acquire semantics are only needed on the last one. This is an
> > argument in favor of having an explicit acquire fence: the acquire
behavior
> > on the fetch_and_add is only needed on the final bump. That would
provide
> > the optimal implementation on PPC.
> >
> > What I am proposing, then, is:
> >
> > // x y initially zero
> >
> > // threads 1 to N:
> >
> > assert( x == 0 );
> >
> > if( fetchadd_release( &y, 1 ) == N - 1 )
> > {
> >     acquire_fence(); // prevent subsequent stores to x from moving
before
> > any previous loads
> >     x = 1;
> > }
> >
> > Note that a explicit fence is something easy for the compiler to
handle. It
> > solves the issue you discuss below, as the compiler is now able to
> > differentiate between the two cases:
> >
> > r1 = x.load_raw();
> > if (r1) {
> >      acquire_fence();   // subsequent stores must wait for the previous
> > load to return
> >      y = 1;
> > }
> > y = 1;
> >
> > r1 = x.load_raw();
> > y = 1;                  // this store can be executed ahead of the load
> > if (r1) {
> >      acquire_fence();
> >      y = 1;
> > }
> >
> > Furthermore, in the first case the compiler is allowed to apply
traditional
> > optimizations to determine that the first store is redundant, and
leave:
> >
> > r1 = x.load_raw();
> > if (r1) {
> >      acquire_fence();   // subsequent stores must wait for the previous
> > load to return
> > }
> > y = 1;                  // this store still cannot be moved ahead by
the
> > compiler
> >                           // and can only be moved ahead by the hw if
(!r1)
> >
> > I understand that some people do not like explicit fences and see them
as
> > overly complicated, but I personally believe that having explicit
fences is
> > easier for the programmer to reason with than having implied behaviors
that
> > are not visible in the source.
> >
> > --
> > Raúl E. Silvera         IBM Toronto Lab   Team Lead, Toronto Portable
> > Optimizer (TPO)
> > Tel: 905-413-4188 T/L: 969-4188           Fax: 905-413-4854
> > D2/KC9/8200/MKM
> >
> >
> > cpp-threads-bounces at decadentplace.org.uk wrote on 12/20/2006 07:48:51
PM:
> >
> > > So, could we say (in non-standardese) that if,
> > >
> > > 1) in an execution E in which x.load_raw() returns x1 in thread t1,
the
> > > nth subsequent store to y in t1 stores y1 to object o1, and
> > >
> > > there exists an x2 such that
> > >
> > > 2) in no execution E' in which x.load_raw returns x2, but which
agrees
> > > with E on everything that happens-before or is sequenced-before the
> > > load_raw, does the nth store to y store y1 to o1
> > >
> > > then the nth subsequent store to y in t1 is dependent on the
x.load_raw?
> > >
> > > (This is starting to look a bit like the Java formulation ...)
> > >
> > > Actually, I think that doesn't quite work.
> > >
> > > In the example
> > >
> > > r1 = x.load_raw();
> > > if (r1) {
> > >      y = 1;
> > > }
> > > y = 1;
> > >
> > > It would declare the second store to y as dependent.  Somehow we need
to
> > > distinguish this example from
> > >
> > > r1 = x.load_raw();
> > > y = 1;
> > > if (r1) {
> > >      y = 1;
> > > }
> > >
> > > which should allow the first store to become visible before the
> > > load_raw.  But this again seems fundamentally broken, since we've now
> > > associated different behavior with program pieces that a normal
> > > optimizer would consider equivalent, and which can be separated into
a
> > > different compilation unit from the load_raw().
> > >
> > > If y is an ordinary variable, it might not matter, because both would
> > > result in a race, as Peter points out.  But if we use atomic raw
stores,
> > > and we have per-variable TSO, I think they are observably different.
> > > (If we use fetch_add_acquire instead of assignment, they clearly
should
> > > be different.)
> > >
> > > So I'm still not sure how we progress along these lines.
> > >
> > > > -----Original Message-----
> > > > From: cpp-threads-bounces at decadentplace.org.uk
> > > > [mailto:cpp-threads-bounces at decadentplace.org.uk] On Behalf
> > > > Of Peter Dimov
> > > > Sent: Wednesday, December 20, 2006 6:16 AM
> > > > To: C++ threads standardisation
> > > > Subject: Re: [cpp-threads] Yet another visibility question
> > > >
> > > > Boehm, Hans wrote:
> > > >
> > > > > If I have
> > > > >
> > > > > if (x.load_raw()) {
> > > > >     y = 1;
> > > > > }
> > > > > y = 1;
> > > > >
> > > > > Is there a dependence?
> > > >
> > > > To answer the question: yes, there is a dependence. Different
> > > > values of x can cause one of the two executions
> > > >
> > > > y = 1;
> > > >
> > > > or
> > > >
> > > > y = 1;
> > > > y = 1;
> > > >
> > > > to take place. So the first store to y is dependent on the x
> > > > load. This variation is more subtle:
> > > >
> > > > r1 = x.load_raw();
> > > >
> > > > if( r1 )
> > > > {
> > > >     y = 1;
> > > > }
> > > >
> > > > if( !r1 )
> > > > {
> > > >     y = 1;
> > > > }
> > > >
> > > > assuming that the two ifs can be hidden away in function
> > > > calls. The store to y is not dependent.
> > > >
> > > > The refcounting example doesn't have this problem; the
> > > > actions that are taken when the count drops to zero are
> > > > provably dependent since the result of the atomic is not
> > > > available to other portions of the code and can't cancel the
> > > > dependence.
> > > That's not completely clear to me.  If one of the zero refcount
actions
> > > is to clear a field in the object, and that same field happens to be
> > > cleared in the other case, could the compiler combine them and move
them
> > > up to before the decrement?
> > >
> > > The argument would have to rely on other intervening synchronization
> > > operations, I think.
> > >
> > > > This relies on atomic loads being somewhat "volatile"
> > > > as far as data flow analysis is concerned, though. That is,
> > > > it relies on:
> > > >
> > > > if( x.load_raw() )
> > > > {
> > > >     y = 1;
> > > > }
> > > >
> > > > if( !x.load_raw() )
> > > > {
> > > >     y = 1;
> > > > }
> > > >
> > > > not being equivalent to the above example, even if the
> > > > compiler can prove that x is not modified during the
> > > > execution of this snippet.
> > > >
> > > > All this seems more trouble than it's worth. :-)
> > > Does that mean that you agree we should forget about dependency-based
> > > ordering and I should get back to figuring out how to prohibit just
"out
> > > of thin air" results?
> > >
> > > >
> > > >
> > > > --
> > > > cpp-threads mailing list
> > > > cpp-threads at decadentplace.org.uk
> > > >
http://www.decadentplace.org.uk/cgi-bin/mailman/listinfo/cpp-threads
> > > >
> > >
> > > --
> > > cpp-threads mailing list
> > > cpp-threads at decadentplace.org.uk
> > > http://www.decadentplace.org.uk/cgi-bin/mailman/listinfo/cpp-threads
> >
> >
> > --
> > cpp-threads mailing list
> > cpp-threads at decadentplace.org.uk
> > http://www.decadentplace.org.uk/cgi-bin/mailman/listinfo/cpp-threads
> >
>
> --
> cpp-threads mailing list
> cpp-threads at decadentplace.org.uk
> http://www.decadentplace.org.uk/cgi-bin/mailman/listinfo/cpp-threads