[cpp-threads] Belated comments on dependency-based orderingproposal

Thu Sep 20 00:24:37 BST 2007

On Wed, Sep 19, 2007 at 03:38:10PM -0700, Lawrence Crowl wrote:
> On 9/19/07, Paul E. McKenney <paulmck at linux.vnet.ibm.com> wrote:
> > >       B) The compiler generates an acquire load to preserve the
> > > ordering anyway.   I think Lawrence and I are arguing for this version.
> >
> > Ah!  I was still thinking of this as a trivial implementation as opposed
> > to a component of an alternative implementation strategy.  The idea is
> > as follows, correct?
> >
> > o       If no N2361 annotations, the compiler must emit an appropriate
> >         memory fences when control leaves the compilation unit.  The code
> >         in the other compilation unit will therefore work correctly even
> >         in presence of local dependency-breaking optimizations.
> 
> I'm not comfortable with "leaves the compilation unit", for at
> least two reasons.  First, in many cases, the programmer has no
> idea which functions are in the current compilation unit.  Can we
> write it in terms of the function calls themselves?  Second, your
> statement implies a unit of analysis equal to the compilation
> unit, and I'd rather not restrict compilers from doing something
> different.  E.g. a compiler could, after analysis, decide to use
> dependence-breaking optimizations in half the compilation unit.
> 
> On the other hand, I do not want to force breaking the dependence
> at all function calls, because virtually, they're everywhere.
> Perhaps the way to say this is with an "as if" rule.

How about if we made this be "leaves the function" rather than "leaves
the compilation unit"?  I believe we would still need to classify library
functions/wrappers/conversions, but at least it would be more definite
from the programmer's viewpoint.

> > o       If there are N2361 annotations, then the compiler can allow
> >         the dependency chain to cross compilation-unit boundaries
> >         via the annotated function arguments and return values.
> >
> > o       There would then need to be some way to kill a dependency chain
> >         in order to avoid gratuitous memory fences.  Your
> >         ignore_dependency() template below would be one approach.
> >         Another approach would be another set of annotations for
> >         function arguments and return values.  The ignore_dependency()
> >         approach seems to cover all the possibilities, so is where I
> >         believe that we should start.
> 
> I think I agree here.  It can also be formulated as a type-generic macro
> for the C language.

Sounds good!

> > And this would need explicit compiler support -- the kill_dependency_chain()
> > called out in N2361 relied on lack of annotations to make this work.
> >
> > So, is it better to have a special template class that has special
> > semantics, or should we instead define a [[dependence_ignore]] attribute
> > from which ignore_dependency() can be constructed:
> >
> > template<class T> [[dependence_ignore]]
> > T ignore_dependency(T x) { return (x); }
> 
> That proposal makes sense to me.

Sounds good as well!

> > Then we have the following cases:
> >
> > o       An unannotated argument that is a member of a dependency chain
> >         causes the compiler to emit a memory fence.
> >
> > o       An argument annotated with N2361 [[dependence_propagate]]
> >         causes the compiler to extend the dependency chain across
> >         a compilation-unit boundary.
> >
> > o       An argument annotated with a new N2361 [[dependence_ignore]]
> >         would neither emit a memory fence nor extend the dependency
> >         chain across a compilation-unit boundary.
> >
> > o       An unannotated return value that is a member of a dependency
> >         chain would neither emit a memory fence nor extend the dependency
> >         chain across a compilation-unit boundary.  Note that the default
> >         is different than for arguments -- in the default case discussed
> >         above, the naively-written library function is protected by the
> >         caller, so the function itself need do nothing upon return.
> >
> >         However, the other naive case is where the naive function calls
> >         another function that explicitly pushes the dependency chain
> >         out through its return value.  This could happen in cases where
> >         there are wrapper functions that are invoked due to implicit
> >         conversions (these still can happen in C++, right?).
> 
> Yes.

Good to know, thank you!

> >         In this
> >         case, the called function would have annotated its return value,
> >         and it seems to me that the compiler would have to distinguish
> >         between these two cases, terminating the dependency chain if
> >         the head of the chain is an atomic load that is lexically within
> >         the function, and propagating it (via explicit memory fence if
> >         need be) if the local head of the chain is instead an annotated
> >         return value from a called function.
> 
> Things got fuzzy on me here.  In f(h(a)) where f and h have annotated
> arguments and h has an annotated return, you are worried that there
> might be an implicit conversion, and the actual code is f(g(h(a))) where
> g has neither annotation.  Why doesn't the code get protected by a
> fence before the call to g?
> 
> (I admit that we have a subtle performance implication here, but that
> is more the result of implicit conversions than of the dependences.)

We are thinking of different cases.  Here is what I was considering:

	[[dependency_propagate]] foo_t *h(void)
	{
		return x.load(memory_order_dependency);
	}

	int f(void)
	{
		implicitly_converted_from_foo_t *p;

		p = h();
		return(p->data);
	}

The compiler would convert the assignment in f() to something like:

	p = implicit_conversion_from_foo_t(h());

If the conversion was a library function, then I am advocating that the
compiler do the following transformation instead, given that it can see
the return-value annotation:

	foo_t temp1 = h();
	atomic_fence(memory_order_acquire);
	p = implicit_conversion_from_foo_t(temp1);

Of course, if the conversion function were annotated, there would
be no need for the fence, since the conversion function would then
be known to preserve dependency chains.

Seem reasonable?

> > o       A return value annotated with N2361 [[dependency_propagate]]
> >         causes the compiler to extend the dependency chain across
> >         a compilation-unit boundary.
> >
> > o       A return value annotated with a new N2361 [[dependence_ignore]]
> >         would neither emit a memory fence nor extend the dependency
> >         chain across a compilation-unit boundary.  This is the same
> >         as the default.
> >
> > o       If the head of the dependency chain is an atomic load that is
> >         lexically contained within the function in question, and if
> >         the programmer wants to maintain ordering, but not to propagate
> >         the dependency chain, then the programmer should insert an
> >         explicit memory fence before the return.  (Or do we want
> >         another annotation that causes the compiler to implicitly
> >         place the memory fences, perhaps omitting it on code paths
> >         that have other memory fences already in place?)
> 
> I'd like a situation where the compiler has a bit of freedom to decide
> whether or not to insert the fence, and the problem with manual fences
> is that the compiler usually won't know why they are there.

Makes sense -- so something like [[dependency_fence]]?

And I presume that I should get new document numbers from Clark and
revise to suit?

							Thanx, Paul