[cpp-threads] Review comments on N2176 WRT dependency ordering

Tue Apr 17 21:54:48 BST 2007

On Tue, Apr 17, 2007 at 02:38:29PM +0300, Peter Dimov wrote:
> Paul E. McKenney wrote:
> >On Mon, Apr 16, 2007 at 06:12:14PM +0300, Peter Dimov wrote:
> 
> >>N2195 proposes the following minimalistic approach:
> >>
> >>template< class T > inline T * atomic_load_address( T * const * p );
> >>template< class T > inline T * atomic_load_address( T * const
> >>volatile * p );
> >
> >I do like the template approach!
> >
> >>Returns: *p.
> >>
> >>Constraint: _Acquire only with respect to **p.
> >
> >OK -- but what about p->some_field?  Or does **p indicate any
> >dereference of p?  I had been assuming the more restrictive mode.
> 
> **p denotes the entire object that *p refers to, including all of its 
> fields.

OK.

> In the dependency discussion paper, you write that a single level of 
> indirection is not enough for the Linux kernel, but I don't see how you 
> could make a multi-level primitive work on an Alpha.

This depends on how the multi-level list was created.  If it was
created by a pair of threads as follows:

-	Thread 1:

	p = new typeof(*p);
	p->a = 1;
	p->next = NULL;
	release_fence();
	head = p;

-	Thread 2:

	q = new typeof(*q);
	q->a = 2;
	q->next = NULL;
	release_fence();
	head->next = q;

Then, yes, an atomic_load_address() would be needed for each stage, as
in:

	p = atomic_load_address(head);
	q = atomic_load_address(p->next);

However, if a single thread created both elements and published them
in one shot as follows:

	p = new typeof(*p);
	p->a = 1;
	q = new typeof(*q);
	q->a = 2;
	q->next = NULL;
	p->next = q;
	release_fence();
	head = p;

Then atomic_load_address() would be needed only for the initial fetch
from head, as in:

	p = atomic_load_address(head);
	q = p->next;

Unlike the earlier example, this multilinked structure is
essentially one element.  This sort of thing happens rather frequently,
such as where each element of a list has other data structures linked to
it, but where these other data structures are owned by the corresponding
element.

>                                                      In:
> 
> r1 = x;
> // rmb
> r2 = r1->a;
> // rmb
> r3 = r2->b;
> 
> I believe that you need two rmb fences.
> 
> r1 = atomic_load_address( &x );
> r2 = atomic_load_address( &r1->a );
> r3 = r2->b;
> 
> can insert the two fences for you, but if the second atomic_load_address is 
> an ordinary operation (somewhere in a translation unit far, far away), 
> there's no way to insert a rmb after it.

Again, it depends on how the structure was created, effectively, on
whether the linked structure is published incrementally (in which case
multiple atomic_load_address()es are needed) or in one shot (in which
case only a single atomic_load_address() is required).

> >>Index-based dependencies are not supported.
> >
> >Why not?
> 
> Are they important enough?

I believe that they are -- they do show up fairly frequently.

>			     We can add a minimalistic index-based primitive 
> along the lines of:
> 
> template<class T, class U> inline U* atomic_load_index( T const * pi, U * 
> base );
> 
> Requires: T shall be integral.
> Returns: base + *pi.
> Constraint: _Acquire only with respect to base[ *pi ].
> 
> but it doesn't cover the complex cases where you want to index more than 
> one array.

Why not just use atomic_load_address() on the index?  Then given a
static array whose indexes are dynamically filled in:

	i = atomic_load_address(myindex);
	r1 = myarray[i];

A dynamic array would require both pointer and index be treated this
way:

	i = atomic_load_address(myindex);
	p = atomic_load_address(myarray);
	r1 = p[i];

						Thanx, Paul