[cpp-threads] [Fwd: [Javamemorymodel-discussion]MSDNonMemorymodels]

Thu Sep 15 19:41:23 BST 2005

> From:  Vance Morrison
> 
> Below are some comments from Han's questions
> 
> > > 1. The description of how SMP caches work is at least inconsistent
> > with
> > > my understanding.  AFAIK, few hardware caches allow inconsistent
> > values
> > > in different processor caches.
> 
> 
> It is my understanding from talking with some Intel folks 
> that the L1 cache of the intel IA64 processor is not coherent 
> with the L2 cache (which is kept coherent with respect to 
> other processors).  However this kind of information is 
> typically hard insure accuracy.  The main point however is 
> that it takes extra effort to insure consistency, and this is 
> independent of the answer to what various processors actually do.

I'm not an expert here either.  But the L1 cache on Itanium 2s is
write-through.  Thus it seems to me that saying it may not be coherent
is really equivalent to saying that there is some write buffering going
on.

The only reason any of this matters, I think, is that it lets people
develop the right mental model for the cost of memory fences.  As far as
I know, memory fences are usually a local operation, though they may
involve waiting for some outstanding memory transactions.  They don't go
out and make caches consistent.  Hence the cost may be large, but it
doesn't usually grow a lot with larger processor counts (except that
waiting for outstanding memory transactions may take longer).

(At least I believe the above is correct.)

> > > 4.  At least the LazyInitClass2 example seems to be legal 
> only with
> a
> > > very weird interpretation of the rules in which the ordering
> > constraints
> > > apply to program text instead of dynamically executed actions.  In
> > > particular:
> > >
> > > "Reads can only move earlier in time, but never past a 
> write to the
> > same
> > > memory location from the same thread."
> > >
> > > seems to apply if the right is textually there, but 
> conditional, and
> > not
> > > executed?  The ordering of the reads of myValue and myValueInit
> > matters
> > > even, or particularly, if the initialization code is not executed,
> and
> > > hence there are no intervening writes.
> > >
> > > As a result there seem to be seemingly equivalent but textually 
> > > rearranged programs that are incorrect?  E.g. if I change the body
> of
> > > GetValue to an if-then-else with the return duplicated in both
> > branches,
> > > then the correctness depends on which branch textually 
> comes first?
> > >
> > > Either I'm misunderstanding this, or something is still 
> broken here
> 
> 
> I am afraid that I don't understand the issue Han's is 
> bringing up here.
> 
> A more detailed description is in order.    Note that this 
> last example
> was the one that there was the most internal debate about whether we
> could make rules that would actually allow it to work in the 
> model.   It
> would not surprise me to find a problem here.   I would be 
> interested in
> exploring the details.  
> 
> Vance
> 

I think there is no practical way to make this example work, unless you
assume that

a) You are on an X86 processor, and
b) Newer X86 processors behave the way we all think they do, as opposed
to the way Intel documents them.

Even then, I think there would be a significant compiler impact.

Here's your example:

public class LazyInitClass2 { 
    private static bool myValueInit = false;
    private static int myValue;

    public static int GetValue() {
        if (!myValueInit) {
            myValue = ComputeMyInt();
            myValueInit = true;
        }
        return myValue;
    }
};

You argue that this cannot be transformed to:

public class MorphedLazyInit2 { 
    private static bool myValueInit = false;
    private static int myValue;

    public static int GetValue() {
        int temp = myValue;
        if (!myValueInit) {
            myValue = temp = computeMyInt();
            myValueInit = true;
        }
        return temp;
    }
};

I agree that's the crucial question.  The transformed version can fail
if an initialization takes place between the read of myValue and the
read of myValueInit, hence the if clause is not executed, and we see an
uninitialized myValue.

You argue that it's prevented by rule 5:

Reads can only move earlier in time, but never past a write to the same
memory location from the same thread.

But in the failing case there were no writes by the thread that failed.
The two reads occurred out of order, but that was the only thing that
changed.  Somehow rule 5 seems to viewing the reordering as a syntactic
transformation, but that doesn't really make sense.

To make this clearer, consider the following equivalent of the first
program:

public class LazyInitClass2b { 
    private static bool myValueInit = false;
    private static int myValue;

    public static int GetValue() {
	  if (myValueInit) return myValue;
        myValue = ComputeMyInt();
        myValueInit = true;
        return myValue;
    }
};

If I transform this to

public class MorphedLazyInitClass2b { 
    private static bool myValueInit = false;
    private static int myValue;

    public static int GetValue() {
	  temp = myValue;
	  if (myValueInit) return temp;
        myValue = ComputeMyInt();
        myValueInit = true;
        return myValue;
    }
};

I've clearly still broken the code, but there is no write involved in
the transformed section of code.

Even assuming no compiler transformation, Itanium is not guaranteed to
enforce ordering between the two reads in the original LazyInitClass2.
(I think current implementations do.  This is admittedly a subtle
issue.)  Thus to make this example work on Itanium, you need to use
ld.acq on the load of myValueInit.  That's fine, but you're not giving
the compiler the information it needs to determine that; it can't really
distinguish that load from any other load, without some program
annotation.

I think PowerPC will have very similar issues.

I also think you're not really doing programmers a favor here:
myValueInit is being used for thread communication.  That's a critical
piece of information when you're trying to read the code, or when some
race-detection tool is looking at the code.  It's not just the compiler
that doesn't have the information it needs ...

I also agree with Bill's concerns, but I think they're completely
orthogonal to this issue.

Hans