[cpp-threads] Re: Review comments on N2176 WRT dependency ordering
Paul E. McKenney
paulmck at linux.vnet.ibm.com
Tue Apr 10 12:39:39 BST 2007
On Mon, Apr 09, 2007 at 05:28:26PM -0700, Hans Boehm wrote:
> On Sun, 8 Apr 2007, Paul E. McKenney wrote:
>
> > Hello again!
> >
> > I once again thank Hans for his careful description of a number of
> > interesting optimization situations relating to dependency-based
> > ordering. The following text discusses some possible approaches
> > to resolving these situations.
> >
> > Thoughts?
> >
> > Thanx, Paul
> >
>
> Thanks for posting this. It would also be good to place a copy on the Oxford
> wiki. This is an issue that I think we really need to resolve asap.
Will do once I get to a high-bandwidth connection.
> My concern with all of these proposals is that they seem to require major
> compiler and standards changes to accomodate what I think will be perceived
> as a fairly narrow problem. The problem will be to get both the people who
> would need to write the standardese (nontrivial) and the people who will
> need to implement this (harder) to buy into this sufficiently.
Understood.
> If we do want to address this, I would certainly advocate a formulation that
> allows dependency-based ordering to be dropped at the cost of replacing
> load_relaxed with load_acquire. Thus I think the implementation overhead
> on X86 would be near zero; any extra syntax could basically be ignored, at
> the expense of some compiler reordering constraints for load_relaxed.
This approach would not be unreasonable for x86.
> Thus the implementation overhead would fall mostly on weakly ordered architectures
> like PowerPC. Are IBMs compiler groups willing to buy into any of these
> solutions?
I have not yet encountered much resistance.
> I bring up the standardese issue, since it is not at all clear to me that the
> notion of "dependency" is easily definable, and it seeasm to me that we would
> have to.
What is your reaction to the notion of dependency defined in the
Itanium and the POWER architecture documents? Page 384 of my copy
of volume 2 of the Itanium Architecture manual defines a dependency
from A to B as follows:
"A precedes B in program order and A produces a value that
B consumes."
> ...
> > N2176's third example shows that an innocent-seeming transformation
> > might convert a dependency chain that would be recognized by a given
> > system into a form that might not be:
> >
> > r1 = x.load_relaxed();
> > if (r1 == 0)
> > r2 = *r1;
> > else
> > r2 = *(r1 + 1);
> >
> > The innocent transformation might result in the following:
> >
> > r1 = x.load_relaxed();
> > if (r1 == 0)
> > r3 = r1;
> > else
> > r3 = r1 + 1;
> > r2 = *r3;
>
> I think I really mangled that example in N2176. Sorry about that.
>
> For a better example of data to control dependence conversion, assume x
> has a value of 0 or 1, and I write
>
> if (x) {
> ...
> } else {
> ...
> }
> y = 42 * x / 13;
>
> The compiler could certainly convert this to
>
> if (x) {
> ...
> y = 3;
> } else {
> ...
> y = 0;
> }
>
> I know of at least one major architecture for which control dependencies
> do not enforce ordering.
This would certainly be an argument for having the programmer mark
the important dependencies -- similar to the way in which atomics
require a programmer to mark the important variables. If "x" was
marked, for example, as follows:
if (load_raw(x)) {
...
} else {
...
}
y = 42 * x / 13;
then the compiler could either leave the dependency (assuming the
hardware respected it) or emit a memory barrier, for example, as
follows:
if (x) {
acquire_fence();
...
y = 3;
} else {
acquire_fence();
...
y = 0;
}
> There are other more subtle differences in dependency type. On Itanium,
>
> if (x_init) {
> y = x;
> } else {
> ... // initialize x;
> y = x;
> }
>
> (when naively compiler) is guaranteed to enforce the order between the loads
> of x_init and x, but the same is not true for
>
> if (x_init) {
> ;
> } else {
> ... // initialize x;
> }
> y = x;
Again, this seems to be another motivation for marking the dependencies
that matter, perhaps via the perObjectPostLoadFence() primitive.
Thanx, Paul
More information about the cpp-threads
mailing list