[cpp-threads] Belated comments on dependency-based orderingproposal

Tue Sep 18 22:46:10 BST 2007

> From: Paul E. McKenney
> Sent: Friday, September 14, 2007 10:19 PM
> On Fri, Sep 14, 2007 at 09:02:24PM -0700, Hans Boehm wrote:
> > You're describing what I thought we were going to do.  But N2360
> > states:
> > 
> > "Next, we define "dependency tree" (or, perhaps more accurately, 
> > attempt to do so):
> > 
> > An evaluation H that performs a dependency operation on an object M 
> > heads a dependency tree, and is also a member of that 
> dependency tree. 
> > Any subsequent evaluation that consumes the result of a prior 
> > evaluation that is a member of H's dependency tree is also 
> a member of 
> > H's dependency tree, unless that prior evaluation is an unannotated 
> > formal parameter or is a return value from a function whose 
> function 
> > prototype specifies an unannotated return value. [Note: dependency 
> > annotations are described in N2361.] [Note: a given 
> evaluation can be 
> > a member of multiple dependency trees.]"
> > 
> > I read this is saying that dependencies don't flow into unannotated 
> > function calls.  Perhaps Paul can explain the intent?
> 
> I was indeed drawing the line at function calls in this document.
> You are suggesting that the line instead be drawn at function 
> calls crossing compilation units?  It might be possible to 
> talk me into that -- my main concern was that this could get 
> sloppy if compilers get more aggressive about whole-program 
> compilation.
> 
> Thoughts?
I think the most fundamental differences here are:

1) Do dependency trees extend past the return of the function that
performed the original atomic load if, for example, a dependent value is
stored in the heap or returned.  I'd be inclined to say no.

2) What happens if the dependency tree extends into code regions the
compiler can't see?  If you assume the answer to the previous question
is no, then I think there are two plausible answers:

	A) The dependency tree is truncated there.  I think this is the
N2360 approach.
	B) The compiler generates an acquire load to preserve the
ordering anyway.   I think Lawrence and I are arguing for this version.

Approach (B) has the huge advantage that you don't have to know whether
the operator[] you're using is built-in or a library call.  They still
behave the same.  The semantics are are moderately clear.  The only
surprise factor is the possible introduction of unexpected fences.  But
that could be handled with warnings on platforms for which it matters.

I think approach (B) doesn't work if the answer to the first question is
yes, since a lot of function results will depend on a load performed in
the function.  But that's probably OK.  We may eventually need an
annotation stating that a function call should be treated as though it
were a dependency-ordered load, so that it becomes possible to do such
loads in function calls, rather than directly or in macros.

In either case, we may want

template<class T> T ignore_dependency(T x) { return (x); }

which behaves like the identity function, but is guaranteed not to
propagate dependencies.

> 
> > > >> I thought we had tentatively concluded in Toronto that all
> > "data
> > >> dependent" loads occurring as a result of the execution of f() 
> > >> should be in the dependency tree anyway, no matter whether they 
> > >> occurred as a result of a nested function call or not.  If the 
> > >> value flows into a separately compiled function, the 
> compiler has 
> > >> to back off and treat memory_order_dependency as 
> > >> memory_order_acquire.  This may add unexpected overhead in some 
> > >> cases.  But examples like the above will be guaranteed 
> to work as expected.
> > >>
> > >> But I think this also isn't reflected here?
> > >>
> > >> My guess is that the N2360 approach (if I understood it 
> correctly) 
> > >> might be viable for C, where function calls are obvious in the 
> > >> source.  But too many things in C++ are a function call 
> behind the 
> > >> scenes.  I'm not very comfortable with an approach that 
> treats v[i] 
> > >> differently depending on whether v is a C int[10] or a 
> C++ std::array<int, 10>.
> > >
> > > If the std::array index is inlined, the compiler has full 
> knowledge 
> > > of the dependences, and one gets a performance win.  If 
> the index is 
> > > not inlined, the performance drops.  But that is true 
> anyway.  I am 
> > > willing to live with a bit of performance uncertainty in 
> C++ codes, 
> > > in part because we already have it.  More importantly, I 
> think, is 
> > > that programmers are unlikely to go to dependence-based ordering 
> > > without serious experimental evidence and lots of other tuning, 
> > > which may very well include careful management of the inlining 
> > > process.
> > Again, I thought that was the plan, but I don't read the current 
> > proposal that way.
> > 
> > >
> > >> I think we'll encounter many more problems along these 
> lines, e.g 
> > >> also with loads in destructors.
> > >
> > > Can you elaborate?
> > >
> > >> I suspect we need more work defining what it means to 
> "consume the 
> > >> result of a prior evaluation".  But someone from the C++ 
> standards 
> > >> committee core group will have to help there.
> > >
> > > Hm.  The pragmatic meaning is clear, which is that the 
> C++ lowered 
> > > to C uses the value.  The standard definition is probably 
> a bit fuzzier.
> >
> > There are no doubt borderline cases, especially with 
> library functions.
> > Does strlen propagate a dependency from one of the characters?
> > Does a long long comparison depend on all parts of the 
> numbers even if 
> > the high halves are different?
> 
> My approach on strlen() would be to say "No, because strlen 
> has neither its arguments nor its return value annotated."  
> If one open-coded strlen() inline, then yes, the dependency 
> would propagate through the characters
> -- but for this to make any sense, there would have to have 
> been a dependency chain (with its head explicitly marked) 
> that propagated to one of the characters, for example (taking 
> care to avoid the Vector
> class):
> 
> 	*(cp + 1) = '\0';
> 	i = 0;
> 	r1 = x.load(memory_order_dependency);
> 	*cp = r1;
> 	for (cp1 = cp; *cp1 != '\0'; cp1++, i++)
> 		continue;
> 	r2 = *(y + i);
> 
> The load into r2 will be ordered after the load into r1 by 
> dependency ordering.
But the dependency chain would still involve a control dependency,
right?  Thus under the current proposal, there would be no ordering?

We clearly have to revisit this with approach (B) above.  If we stick
with the data dependency-only approach, my take on it would be to say
that only explicitly listed library functions (incl. most or all of the
operators) propagate dependencies.  In most other cases, we really have
no idea whether the library implementation uses data or control
dependencies.

> 
> In the long-long case, the most interesting situation is on a 
> 32-bit machine.  For this to be interesting, the earlier code 
> must have had a marked dependency head that propagated to the 
> upper half of the long long, but not to the lower half.  So 
> you are thinking of something like the following, where ll is 
> a long long that has been promoted to a register or something?
> 
> 	r1 = x.load(memory_order_dependency);
> 	ll += ((long long)r1) << 32;
> 	r2 = *(y + (ll & ((1LL << 32) - 1)))
> 
> If the instances of "32" were instead "31", there would 
> clearly be a dependency.  For the sake of the programmer's 
> sanity, I would argue for the above also carrying a 
> dependency, so that the load into
> r2 would be dependency-ordered after the load into r1.
> 
> Seem reasonable?
Maybe.  We can change the example to to

 	r1 = x.load(memory_order_dependency);
 	ll += ((0x1 << 32) & r1);
 	r2 = (ll != 0);  // Comparison checks lowest half first; lowest
half is nonzero
	r3 = y[r2];

Now I suspect that some compilers would generate code that in fact has
no data dependency from r1 to r2, though I haven't verified that.

Hans
> 
> 						Thanx, Paul
> 
> --
> cpp-threads mailing list
> cpp-threads at decadentplace.org.uk
> http://www.decadentplace.org.uk/cgi-bin/mailman/listinfo/cpp-threads
>