[cpp-threads] Web site updated

Tue Feb 13 02:29:30 GMT 2007

On Tue, Feb 13, 2007 at 01:36:07AM -0000, Boehm, Hans wrote:
> > From:  Paul E. McKenney
> Thanks for the comments.
> > 
> > > In particular, I finally added the promised rationale 
> > documents under 
> > > the seventh bullet:
> > > 
> > 
> > >     * Why do we not guarantee that dependencies enforce 
> > memory ordering?
> > 
> > In the suggestion at the end of the document, wouldn't it be 
> > reasonable to disallow application of profiling-based 
> > optimizations to quantities that derive from load_raw()?  
> > Should be a matter of tagging, correct?
> > This might well force a global view during profile-based 
> > optimizations, but such a global view might well enable much 
> > more powerful optimizations.
> I suspect that's still hard, though we probably want to discuss this.
> Consider
> inside f(x,y,z):
> 
> r2 = *x;
> r3 = r2 -> a;
> 
> transformed to
> 
> r2 = *x;
> r3 = r1 -> a;
> if (r1 != r2) r3 = r2 -> a;
> 
> as before, but without the load_raw.  This is fine unless I call it as
> 
> tmp = x.load_raw(); f(&tmp, y, z);

But in this case, the value of x would be the address of the temporary,
-not- of the shared variable x.  Therefore, the comparison should not
be a problem.

Now, if the structure pointed to by x is shared and is subject to
concurrent modification by other CPUs, then f() has to be written to
handle that, just as in (say) Java.  But both versions of the above code
snippet from f() would be potentially unsafe in this case.

> I think you're arguing that I would need the whole program available
> before I optimize f(), so that I can detect such cases.  But what if f()
> is in a dynamic library that I don't have source to, and that is
> optimized by the vendor? 

Unless I am missing something, the above example doesn't cause a problem.
(At least not a problem not already present in the original unoptimized
code.)

Given some other example that did cause a problem, such vendors would
have optimized themselves out of the multi-core market.

> > Now, about a store_raw() -- you guys OK with this, for 
> > example, for split per-thread counters?  No implicit barriers 
> > or additional dependency checking, and no atomics except as 
> > needed by complex or unaligned data items.
>
> This still operates on data declared as atomic, but otherwise, yes.
> That should be fine.

Good!

> There is currently no statement in the specification to ensure that
> "raw" atomic accesses become visible elsewhere "promptly", for example
> that read accesses should not be cached in a register.  I expect it
> would be hard to define that precisely, but it may be good to add a note
> to that effect in the documentation for the atomics library.

>From the load_raw() side, my hope would be that the compiler would be
required to "forget" where the value came from, thus being unable to
recognize that the following:

	x = a.load_raw();
	y = a.load_raw();

might be cached -- in other words, there must be a separate load from "a"
for both x and y.  Or am I missing your point?

>From the store_raw() side, a similar "forgetfulness" rule should prevent
the store combining that the optimizer might otherwise be tempted to engage
in.  We certainly would not want the compiler to slide a store_raw()
past a long (or worse, infinite) loop.  Would it be reasonable to use
some of the rules that are often used for volatile?

							Thanx, Paul

> > Then explicit memory barriers, though not necessarily as a 
> > replacement for those that you are proposing in atomic accesses.
> > 
> > >     * Why do our ordering constraints not distinguish between loads 
> > > and stores, when many architectures provide fences that do?
> > 
> > This example certainly underscores my estimate of the value 
> > of hardware-enforced ordering based on data dependency.  This 
> > leaves Alpha, which does not enforce such ordering.  However, 
> > it also ends up being OK with this example, but by accident, 
> > because it has no memory barrier that orders only loads.  Any 
> > attempt to order only loads on Alpha gets you the full memory barrier.
> > 
> > Cute example, though!
> > 
> > 						Thanx, Paul
> > 
> > > Hans