[cpp-threads] memory model

Boehm, Hans hans.boehm at hp.com
Sat Apr 30 02:27:01 BST 2005



> -----Original Message-----
> From: Alexander Terekhov [mailto:alexander.terekhov at gmail.com] 
> > 1) What do we do if the hardware cannot provide the requested 
> > operation?
> 
> Such as?
> 
> > Do we emulate it with locks?
> 
> Makes no sense to me. I can do better with locks.
No doubt.  But what if you just want your code to also run on some
legacy platform, and you don't really care about its performance?

Or it's a case where you're just counting events, and it was easier
to throw in a fetch-and-add than a lock, even though performance
really doesn't matter?

I think you're right in that we do need people to test whether
the implementation is real or some stupid emulation.  This is a
little tricky, since the slowest "real" implementation is probably
slower than the fastest "stupid emulation".
> > Are there any architectures on which a load and a 
> data-dependent store 
> > are not ordered?
> 
> On Alpha, I suppose.
I know that data-dependent loads are not ordered there.  I don't
remember about stores.  But I don't think we should make decisions
based on Alpha anyway.

> > I'm actually not sure what to do about the ordering constraints in 
> > both proposals that mention data-dependence or control-dependence. 
> > Thinking about this a bit more, I'm not sure those really 
> make sense 
> > at this level for the same reasons we avoided mentioning 
> them in the 
> > Java spec.  The compiler can easily convert between them, or 
> > optimize/pessimize away the dependency.
> 
More details:

Compilers convert between control and data dependence all the
time, by converting conditionals to predicated instructions.
Many modern architectures have at least conditional move instructions.
Itanium allows almost all instructions to be executed conditionally
based on the contents of a predicate register.  The reverse is
also done occasionally.

Even more importantly, compilers can optimize away conditionals
if they can deduce the value of the condition.  This makes it
really difficult to say what it means that some ordering should be
respected for control dependent operations.  If I have

	r1 = load_acquire_for_control_dependent_loads(&x);
	if (r1 == r1)
		r2 = y

are the loads of x and y ordered?  You know there won't be a
branch in the object code.  (Well, you hope that anyway.)

I'm not sure if load_acquire_for_data_dependent_loads makes
sense either.  If I have

	r3 = ...
	r1 = load_acquire_for_data_dependent_loads(x)
	r2 = *(r3 + r1 - r1)

The second load also isn't really data-dependent on the first,
by the time you look at the assembly code.  If anything you want

	
load_acquire_for_data_dependencies_that_the_compiler_really_cant_elimina
te_no_matter_how_hard_it_tries

I'm not sure that's a well-defined concept.

I think there's a good case for the following order constraints:

	acquire+release
	acquire	(load_acquire is cheap on IA64, very cheap on X86, SPARC
TSO, and clearly useful)
	release	(similarly for store_release)
	acquire_for_later_loads (I think. If it makes a difference
anywhere important.)

Everything else seems tricky to me.

Hans




More information about the cpp-threads mailing list