[cpp-threads] Web site updated

Tue Feb 13 19:12:42 GMT 2007

On Mon, Feb 12, 2007 at 09:40:47PM -0800, Hans Boehm wrote:
> On Mon, 12 Feb 2007, Paul E. McKenney wrote:
> 
> > On Tue, Feb 13, 2007 at 01:36:07AM -0000, Boehm, Hans wrote:
> > > > From:  Paul E. McKenney
> > > Thanks for the comments.
> > > >
> > > > >     * Why do we not guarantee that dependencies enforce
> > > > memory ordering?
> > > >
> > > > In the suggestion at the end of the document, wouldn't it be
> > > > reasonable to disallow application of profiling-based
> > > > optimizations to quantities that derive from load_raw()?
> > > > Should be a matter of tagging, correct?
> > > > This might well force a global view during profile-based
> > > > optimizations, but such a global view might well enable much
> > > > more powerful optimizations.
> > > I suspect that's still hard, though we probably want to discuss this.
> > > Consider
> > > inside f(x,y,z):
> > >
> > > r2 = *x;
> > > r3 = r2 -> a;
> > >
> > > transformed to
> > >
> > > r2 = *x;
> > > r3 = r1 -> a;
> > > if (r1 != r2) r3 = r2 -> a;
> > >
> > > as before, but without the load_raw.  This is fine unless I call it as
> > >
> > > tmp = x.load_raw(); f(&tmp, y, z);
> >
> > But in this case, the value of x would be the address of the temporary,
> > -not- of the shared variable x.  Therefore, the comparison should not
> > be a problem.
>
> I think it still is.  Assume that r1 is a value that is coincidentally
> usually equal to *x (the result of the load_raw) in the function, but
> not dependent on it (in the hardware sense).  Then, as far as the
> hardware is concerned, in the "optimized" code, r1 -> a is no longer
> dependent on the load_raw and hence is no longer ordered with respect
> to it.
> 
> The comparison is comparing the value returned by the load_raw
> (== tmp == *x) to the guess r1.  If r1 and r2 are equal, I don't
> see why thre would be any guarantee that the load into r3 and the
> load_raw remain ordered, eventhough they appear dependent at the
> source level.

The fact that the store into tmp happened earlier should force ordering
on any reasonable hardware implementation, but point taken.

So, what is the benefit of the above optimization?  Both r1 and r2 are
registers, so there should be no difference in access to either of them.
If they both point to the same place, there should be no difference in
accessing via either (x86's strange and restricted register set aside
for the moment).

So how does this optimization help?

							Thanx, Paul