[cpp-threads] Review comments on N2176 WRT dependency ordering

Wed Apr 18 00:17:15 BST 2007

Paul E. McKenney wrote:
> On Wed, Apr 18, 2007 at 12:32:02AM +0300, Peter Dimov wrote:
>> Paul E. McKenney wrote:
>>> On Tue, Apr 17, 2007 at 02:38:29PM +0300, Peter Dimov wrote:
>>
>>>> In the dependency discussion paper, you write that a single level
>>>> of indirection is not enough for the Linux kernel, but I don't see
>>>> how you could make a multi-level primitive work on an Alpha.
>>>
>>> This depends on how the multi-level list was created. [...]
>>
>> Yes, you are right. atomic_load_address will probably always work in
>> practice for such dependencies, but I'm not sure that I can specify
>> it so that it will also work in theory.
>
> I would like to help here, but need to understand what is broken in
> theory.  The dependency chains seem to me to be pretty well defined,
> particularly given that their heads are marked and that any passage of
> the chain to a different compilation unit is also marked in both the
> function prototype and the function definition.
>
> So, what am I missing here?

Actually, on third reading, it will work if the dependencies are tagged and 
tracked as you suggest in your paper. I've no idea how hard this would be to 
implement. :-)

I also noticed that you don't seem to distinguish between data and code 
dependencies... is that on purpose? My understanding is that PowerPC 
enforces data dependencies for both loads and stores but does not enforce 
code dependencies for loads?

>> The explicit per-object fence can handle these scenarios - in
>> principle - since the user specifies the dependent object explicitly.
>>
>> i = atomic_load_relaxed( myindex );
>>
>> dependency_fence( &myarray[ i ] );
>> dependency_fence( &myarray2[ i * 2 ] );
>
> OK, I'll bite -- what are the above dependency_fence() calls doing,
> either in theory or in practice?

The hypothetical dependency_fence( p ) calls mark *p as a dependent object. 
I'm not sure that they simplify the optimizer implementation compared to 
your suggestion, though. Maybe not.

>> r1 = myarray[ i ].foo;
>> r2 = myarray[ i ].bar;
>> r3 = myarray[ i*2 ].baz;

There's also the option of just providing dependency_fence() which maps to 
atomic_compiler_fence( __acquire ) on everything except Alpha, where it 
inserts an additional rmb as well.