[cpp-threads] Belated comments on dependency-based ordering proposal

Sat Sep 15 06:18:36 BST 2007

On Fri, Sep 14, 2007 at 09:02:24PM -0700, Hans Boehm wrote:
> On Fri, 14 Sep 2007, Lawrence Crowl wrote:
> 
> > On 9/14/07, Boehm, Hans <hans.boehm at hp.com> wrote:
> >> Consider example 2:
> >>
> >> atomic<int> x;
> >> vector<int> v;
> >>
> >> void f() {
> >>         int i = x.load(memory_order_dependency);
> >>         int j = v[i];
> >>         ...
> >> }
> >>
> >> I don't think we can reasonably annotate the standard vector class.
> >> Thus I don't know how to enforce the dependency ordering here, since
> >> operator[] is elsewhere and not annotated.  If you don't enforce
> >> ordering here, it seems to me the code becomes brittle in that if you
> >> change a C array to a C++ library one, things are allowed to break.
> >
> > What do you mean by break?  As I understand the proposal, all that
> > happens is that the dependency ordering gets strengthened to acquire
> > ordering before the call.  Even in the case that strengthening happens,
> > there is still a potential performance advantage in less visited paths.
> >
> >    int i = x.load( memory_order_dependency );
> >    if ( i > 0 )
> >        int j = v[i];
> >
> > becomes (internally to the compiler)
> >
> >    int i = x.load( memory_order_dependency );
> >    if ( i > 0 ) {
> >        x.fence( memory_order_acquire );
> >        int j = v[i];
> >    }
> >
> > when v[i] is an unannotated function call.  If x is usually zero, you
> > avoid the acquire cost.
> You're describing what I thought we were going to do.  But N2360
> states:
> 
> "Next, we define "dependency tree" (or, perhaps more accurately, attempt 
> to do so):
> 
> An evaluation H that performs a dependency operation on an object M heads 
> a dependency tree, and is also a member of that dependency tree. Any 
> subsequent evaluation that consumes the result of a prior evaluation that 
> is a member of H's dependency tree is also a member of H's dependency 
> tree, unless that prior evaluation is an unannotated formal parameter or 
> is a return value from a function whose function prototype specifies an 
> unannotated return value. [Note: dependency annotations are described in 
> N2361.] [Note: a given evaluation can be a member of multiple dependency 
> trees.]"
> 
> I read this is saying that dependencies don't flow into unannotated
> function calls.  Perhaps Paul can explain the intent?

I was indeed drawing the line at function calls in this document.
You are suggesting that the line instead be drawn at function calls
crossing compilation units?  It might be possible to talk me into
that -- my main concern was that this could get sloppy if compilers
get more aggressive about whole-program compilation.

Thoughts?

> > >> I thought we had tentatively concluded in Toronto that all 
> "data
> >> dependent" loads occurring as a result of the execution of f() should be
> >> in the dependency tree anyway, no matter whether they occurred as a
> >> result of a nested function call or not.  If the value flows into a
> >> separately compiled function, the compiler has to back off and treat
> >> memory_order_dependency as memory_order_acquire.  This may add
> >> unexpected overhead in some cases.  But examples like the above will be
> >> guaranteed to work as expected.
> >>
> >> But I think this also isn't reflected here?
> >>
> >> My guess is that the N2360 approach (if I understood it correctly) might
> >> be viable for C, where function calls are obvious in the source.  But
> >> too many things in C++ are a function call behind the scenes.  I'm not
> >> very comfortable with an approach that treats v[i] differently depending
> >> on whether v is a C int[10] or a C++ std::array<int, 10>.
> >
> > If the std::array index is inlined, the compiler has full knowledge
> > of the dependences, and one gets a performance win.  If the index
> > is not inlined, the performance drops.  But that is true anyway.  I
> > am willing to live with a bit of performance uncertainty in C++
> > codes, in part because we already have it.  More importantly, I
> > think, is that programmers are unlikely to go to dependence-based
> > ordering without serious experimental evidence and lots of other
> > tuning, which may very well include careful management of the
> > inlining process.
> Again, I thought that was the plan, but I don't read the current proposal
> that way.
> 
> >
> >> I think we'll encounter many more problems along these lines,
> >> e.g also with loads in destructors.
> >
> > Can you elaborate?
> >
> >> I suspect we need more work defining what it means to "consume the result
> >> of a prior evaluation".  But someone from the C++ standards committee
> >> core group will have to help there.
> >
> > Hm.  The pragmatic meaning is clear, which is that the C++ lowered
> > to C uses the value.  The standard definition is probably a bit fuzzier.
>
> There are no doubt borderline cases, especially with library functions.
> Does strlen propagate a dependency from one of the characters?
> Does a long long comparison depend on all parts of the numbers even
> if the high halves are different?

My approach on strlen() would be to say "No, because strlen has neither
its arguments nor its return value annotated."  If one open-coded strlen()
inline, then yes, the dependency would propagate through the characters
-- but for this to make any sense, there would have to have been a
dependency chain (with its head explicitly marked) that propagated to
one of the characters, for example (taking care to avoid the Vector
class):

	*(cp + 1) = '\0';
	i = 0;
	r1 = x.load(memory_order_dependency);
	*cp = r1;
	for (cp1 = cp; *cp1 != '\0'; cp1++, i++)
		continue;
	r2 = *(y + i);

The load into r2 will be ordered after the load into r1 by dependency
ordering.

In the long-long case, the most interesting situation is on a 32-bit
machine.  For this to be interesting, the earlier code must have
had a marked dependency head that propagated to the upper half of
the long long, but not to the lower half.  So you are thinking of
something like the following, where ll is a long long that has
been promoted to a register or something?

	r1 = x.load(memory_order_dependency);
	ll += ((long long)r1) << 32;
	r2 = *(y + (ll & ((1LL << 32) - 1)))

If the instances of "32" were instead "31", there would clearly be a
dependency.  For the sake of the programmer's sanity, I would argue
for the above also carrying a dependency, so that the load into
r2 would be dependency-ordered after the load into r1.

Seem reasonable?

						Thanx, Paul