[cpp-threads] Belated comments on dependency-based orderingproposal

Thu Sep 20 00:44:31 BST 2007

On Wed, Sep 19, 2007 at 11:29:57PM -0000, Boehm, Hans wrote:
>  
> 
> > -----Original Message-----
> > From: lawrence.crowl at gmail.com 
> > [mailto:lawrence.crowl at gmail.com] On Behalf Of Lawrence Crowl
> > Sent: Wednesday, September 19, 2007 3:38 PM
> > To: paulmck at linux.vnet.ibm.com; C++ threads standardisation
> > Cc: Boehm, Hans
> > Subject: Re: [cpp-threads] Belated comments on 
> > dependency-based orderingproposal
> > 
> > On 9/19/07, Paul E. McKenney <paulmck at linux.vnet.ibm.com> wrote:
> > > >       B) The compiler generates an acquire load to preserve the
> > > > ordering anyway.   I think Lawrence and I are arguing for 
> > this version.
> > >
> > > Ah!  I was still thinking of this as a trivial implementation as 
> > > opposed to a component of an alternative implementation 
> > strategy.  The 
> > > idea is as follows, correct?
> > >
> > > o       If no N2361 annotations, the compiler must emit an 
> > appropriate
> > >         memory fences when control leaves the compilation 
> > unit.
> So long as a potential data dependency also leaves the compilation unit.
> If I have
> 
> {
>    int r1 = x.load(memory_order_dependency);
>    int r2 = foo();
>    int r3 = a[r1];
> }
> 
> I should be OK without a fence, even if foo is not annotated and
> compiled separately.

Good point, agreed.

> > The code
> > >         in the other compilation unit will therefore work 
> > correctly even
> > >         in presence of local dependency-breaking optimizations.
> > 
> > I'm not comfortable with "leaves the compilation unit", for 
> > at least two reasons.  First, in many cases, the programmer 
> > has no idea which functions are in the current compilation 
> > unit.  Can we write it in terms of the function calls 
> > themselves?  Second, your statement implies a unit of 
> > analysis equal to the compilation unit, and I'd rather not 
> > restrict compilers from doing something different.  E.g. a 
> > compiler could, after analysis, decide to use 
> > dependence-breaking optimizations in half the compilation unit.
> > 
> > On the other hand, I do not want to force breaking the 
> > dependence at all function calls, because virtually, they're 
> > everywhere.
> > Perhaps the way to say this is with an "as if" rule.
>
> I think "compilation unit" should really be the region into which the
> compiler happens to have visibility.  We're not guaranteeing the
> introductionof fences anywhere.  We're just guaranteeing that
> dependency-based ordering will be preserved until the return from the
> function.  If the compiler can't see enough of the code to ensure that,
> it will need to add fences.  If you want fewer fences, you may need to
> tell it more with annotations.

OK.  Are you still in favor of having function returns break dependency
chains unless annotated, even when the compiler has visibility?  (With
the possible exception for implicit compiler-inserted functions.)

> > > o       If there are N2361 annotations, then the compiler can allow
> > >         the dependency chain to cross compilation-unit boundaries
> > >         via the annotated function arguments and return values.
>
> I hadn't thought about that, but that seems fine.

OK.

> > > o       There would then need to be some way to kill a 
> > dependency chain
> > >         in order to avoid gratuitous memory fences.  Your
> > >         ignore_dependency() template below would be one approach.
> > >         Another approach would be another set of annotations for
> > >         function arguments and return values.  The 
> > ignore_dependency()
> > >         approach seems to cover all the possibilities, so is where I
> > >         believe that we should start.
> > 
> > I think I agree here.  It can also be formulated as a 
> > type-generic macro for the C language.
> > 
> > > And this would need explicit compiler support -- the 
> > > kill_dependency_chain() called out in N2361 relied on lack 
> > of annotations to make this work.
> > >
> > > So, is it better to have a special template class that has special 
> > > semantics, or should we instead define a [[dependence_ignore]] 
> > > attribute from which ignore_dependency() can be constructed:
> > >
> > > template<class T> [[dependence_ignore]] T ignore_dependency(T x) { 
> > > return (x); }
> > 
> > That proposal makes sense to me.
>
> I suspect that the chances of getting an additional function template
> into the library at this stage are higher than those of getting new
> annotations in.  But that's just a guess.

Propose both and have two chances?  (Sorry, couldn't resist...)

> > > Then we have the following cases:
> > >
> > > o       An unannotated argument that is a member of a 
> > dependency chain
> > >         causes the compiler to emit a memory fence.
> > >
> > > o       An argument annotated with N2361 [[dependence_propagate]]
> > >         causes the compiler to extend the dependency chain across
> > >         a compilation-unit boundary.
> > >
> > > o       An argument annotated with a new N2361 [[dependence_ignore]]
> > >         would neither emit a memory fence nor extend the dependency
> > >         chain across a compilation-unit boundary.
> > >
> > > o       An unannotated return value that is a member of a dependency
> > >         chain would neither emit a memory fence nor extend 
> > the dependency
> > >         chain across a compilation-unit boundary.  Note 
> > that the default
> > >         is different than for arguments -- in the default 
> > case discussed
> > >         above, the naively-written library function is 
> > protected by the
> > >         caller, so the function itself need do nothing upon return.
>
> This isn't quite what we want, I think.  The return from the function
> that included the dependency-ordered load terminates all dependency
> chains originatinging from that load.  The returns from called functions
> propagate dependencies.  They have to in order for the STL vector
> example to work.  We should probably think about annotations to extend
> these chains by a level, so you can encapsulate the dependency-ordered
> load itself in a function.

But in the case of an unannotated separately compiled called function, the 
memory barrier emitted prior to the call would make it unnecessary for the
return from that called function to do anything, right?

							Thanx, Paul

> > >         However, the other naive case is where the naive 
> > function calls
> > >         another function that explicitly pushes the dependency chain
> > >         out through its return value.  This could happen in 
> > cases where
> > >         there are wrapper functions that are invoked due to implicit
> > >         conversions (these still can happen in C++, right?).
> > 
> > Yes.
> > 
> > >         In this
> > >         case, the called function would have annotated its 
> > return value,
> > >         and it seems to me that the compiler would have to 
> > distinguish
> > >         between these two cases, terminating the dependency chain if
> > >         the head of the chain is an atomic load that is 
> > lexically within
> > >         the function, and propagating it (via explicit 
> > memory fence if
> > >         need be) if the local head of the chain is instead 
> > an annotated
> > >         return value from a called function.
> > 
> > Things got fuzzy on me here.  In f(h(a)) where f and h have 
> > annotated arguments and h has an annotated return, you are 
> > worried that there might be an implicit conversion, and the 
> > actual code is f(g(h(a))) where g has neither annotation.  
> > Why doesn't the code get protected by a fence before the call to g?
> > 
> > (I admit that we have a subtle performance implication here, 
> > but that is more the result of implicit conversions than of 
> > the dependences.)
> > 
> > >
> > > o       A return value annotated with N2361 [[dependency_propagate]]
> > >         causes the compiler to extend the dependency chain across
> > >         a compilation-unit boundary.
> > >
> > > o       A return value annotated with a new N2361 
> > [[dependence_ignore]]
> > >         would neither emit a memory fence nor extend the dependency
> > >         chain across a compilation-unit boundary.  This is the same
> > >         as the default.
> > >
> > > o       If the head of the dependency chain is an atomic 
> > load that is
> > >         lexically contained within the function in question, and if
> > >         the programmer wants to maintain ordering, but not 
> > to propagate
> > >         the dependency chain, then the programmer should insert an
> > >         explicit memory fence before the return.  (Or do we want
> > >         another annotation that causes the compiler to implicitly
> > >         place the memory fences, perhaps omitting it on code paths
> > >         that have other memory fences already in place?)
> > 
> > I'd like a situation where the compiler has a bit of freedom 
> > to decide whether or not to insert the fence, and the problem 
> > with manual fences is that the compiler usually won't know 
> > why they are there.
> > 
> > --
> > Lawrence Crowl
> >