[cpp-threads] Web site updated

Wed Feb 14 00:36:52 GMT 2007

> -----Original Message-----
> From: Paul E. McKenney [mailto:paulmck at linux.vnet.ibm.com] 
> Sent: Tuesday, February 13, 2007 11:13 AM
> To: Boehm, Hans
> Cc: C++ threads standardisation
> Subject: Re: [cpp-threads] Web site updated
> 
> On Mon, Feb 12, 2007 at 09:40:47PM -0800, Hans Boehm wrote:
> > On Mon, 12 Feb 2007, Paul E. McKenney wrote:
> > 
> > > On Tue, Feb 13, 2007 at 01:36:07AM -0000, Boehm, Hans wrote:
> > > > > From:  Paul E. McKenney
> > > > Thanks for the comments.
> > > > >
> > > > > >     * Why do we not guarantee that dependencies enforce
> > > > > memory ordering?
> > > > >
> > > > > In the suggestion at the end of the document, wouldn't it be 
> > > > > reasonable to disallow application of profiling-based 
> > > > > optimizations to quantities that derive from load_raw()?
> > > > > Should be a matter of tagging, correct?
> > > > > This might well force a global view during profile-based 
> > > > > optimizations, but such a global view might well enable much 
> > > > > more powerful optimizations.
> > > > I suspect that's still hard, though we probably want to 
> discuss this.
> > > > Consider
> > > > inside f(x,y,z):
> > > >
> > > > r2 = *x;
> > > > r3 = r2 -> a;
> > > >
> > > > transformed to
> > > >
> > > > r2 = *x;
> > > > r3 = r1 -> a;
> > > > if (r1 != r2) r3 = r2 -> a;
> > > >
> > > > as before, but without the load_raw.  This is fine 
> unless I call 
> > > > it as
> > > >
> > > > tmp = x.load_raw(); f(&tmp, y, z);
> > >
> > > But in this case, the value of x would be the address of the 
> > > temporary,
> > > -not- of the shared variable x.  Therefore, the comparison should 
> > > not be a problem.
> >
> > I think it still is.  Assume that r1 is a value that is 
> coincidentally 
> > usually equal to *x (the result of the load_raw) in the 
> function, but 
> > not dependent on it (in the hardware sense).  Then, as far as the 
> > hardware is concerned, in the "optimized" code, r1 -> a is 
> no longer 
> > dependent on the load_raw and hence is no longer ordered 
> with respect 
> > to it.
> > 
> > The comparison is comparing the value returned by the 
> load_raw (== tmp 
> > == *x) to the guess r1.  If r1 and r2 are equal, I don't 
> see why thre 
> > would be any guarantee that the load into r3 and the 
> load_raw remain 
> > ordered, eventhough they appear dependent at the source level.
> 
> The fact that the store into tmp happened earlier should 
> force ordering on any reasonable hardware implementation, but 
> point taken.
> 
> So, what is the benefit of the above optimization?  Both r1 
> and r2 are registers, so there should be no difference in 
> access to either of them.
> If they both point to the same place, there should be no 
> difference in accessing via either (x86's strange and 
> restricted register set aside for the moment).
> 
> So how does this optimization help?
> 
Conceivably by breaking the length of dependency chains, allowing the
code to execute faster on out of order architectures.  But even if it
doesn't, we would now have to tell compiler writers that certain
(probably nonprofitable) transformations on non-atomic code are now no
longer legal.  Which would at least require a clear definition of what
those are in terms of program behavior, which sounds hard.

Having said that, I'm also still thinking about alternatives here.

Hans