[cpp-threads] Re: Dependency-based ordering

Thu Jun 21 02:08:46 BST 2007

On Wed, Jun 20, 2007 at 09:58:55PM -0000, Boehm, Hans wrote:
> Paul -
> 
> Doug and I were discussing another possible approach to dealing with
> dependency-based ordering at FCRC.  This approach seems cleaner to us.
> But there may be some danger that we cleaned it up to the point of being
> less useful in the process.
> 
> Here's a version with placeholder syntax that clearly needs a lot of
> work.  But I think the syntax is secondary.
> 
> We would write something like:
> 
> with_load_dependent_acquire r1 = x {
> 	// Any memory references in this block that depend on r1 are
> ordered with respect to the original load of x
>       y = *r1;	// ordered
> 	y = *x;  	// not ordered; not dependent on r1
> 	r2 = r1;
> }
> // Any memory references after the block are no longer ordered with
> respect to the load of x, whether dependent or not.
> Y = *r2;	// Not ordered; out of scope

I certainly applaud the spirit of the proposal -- in the Linux
kernel, the scope of the equivalent to "with_load_dependent_acquire"
(rcu_dereference()) is in fact limited by the outermost enclosing
rcu_read_lock()/rcu_read_unlock() pair.

On the syntax, there are a number of uses of RCU where the first
rcu_dereference() cannot mark the beginning of the scope.  A common
pattern goes something like the following:

	rcu_read_lock();
	for (i = 0; i < N_ELEMENTS; i++) {
		p = rcu_dereference(a[i]);
		if (p == NULL)
			continue;
		if (this_is_the_one(p))
			break;
	}
	if (i >= N_ELEMENTS) {
		rcu_read_unlock();
		return;
	}
	list_for_each_rcu(q, &p->head)
		do_something_with(q);
	rcu_read_unlock();

Beginning the scope of the RCU read-side critical section with the
first rcu_dereference() would be pretty unnatural in this case, as
there would need to be N_ELEMENTS scopes to cover the possible number
of calls to this_is_the_one().

See the unregister_intr_pda() function in the Linux kernel for a
more-elaborate example of this pattern.  There are others.

> On something like PowerPC, this would compile as:
> 
> 1) If the compiler can see all memory references in the block that might
> be dependent on r1, it would preserve those dependencies and use no
> fences.
> 
> 2) If the block contains non-inlined function calls that might perform
> memory operations dependent on r1, it would be compiled as a
> load_acquire (as it would be on X86, etc.)
> 
> This seems to have the big advantage that it might often be useful
> without further annotations.  Since it applies only in a specific scope,
> we don't need to worry about dependent memory references later in the
> program.
> 
> What do you think?

Some of the rcu_dereference() uses in the Linux kernel do in fact have
a limited scope, but quite a few of them do call external functions,
passing pointers that require dependency ordering into them.  This need to
call external functions was what lead me to propose decorating function
arguments and return values in N2260 (see the "Multiple Functions"
section).

So I am quite concerned about leaving out the part that allows dependency
chains to cross compilation-unit boundaries.

							Thanx, Paul