[cpp-threads] Alternatives to SC (prohibiting data races, etc.)

Boehm, Hans hans.boehm at hp.com
Tue Jan 30 22:16:26 GMT 2007


> From: Paul E. McKenney [mailto:paulmck at linux.vnet.ibm.com] 
> 
> If load_raw() can be used as follows:
> 
>  	r1 = load_raw(x);
>  	if (r1 < 2) {
>  	    ...
>  	    switch(r1) {
>  	      case 0:
>  	      case 1: ...
>  	    }
>          }
> 
> and prevent x from being reloaded without issuing extra locks 
> or memory barriers, I -think- that it does at least part of 
> what it needs to.  It is essentially making the compiler 
> forget where it got r1 from, correct?
Right. And that was always the intent.  With load_raw the wild branch is
no longer possible.  And I think we've imposed considerably fewer
compiler constraints than barrier().  And the code remains annotated
with a warning that there is an "intentional race" here (though we no
longer call it a data race, since it involves atomics).

> 
> > > Yep, Linux mandates that hardware atomically load and 
> store properly 
> > > aligned ints and pointers.
> >
> > And for Linux that perfectly reasonable.  For us, I suspect 
> it isn't, 
> > though I don't have great data.
> 
> Well, a good part of this discussion has indeed circled 
> around exactly what requirements we should impose on the 
> underlying hardware.  ;-)
> 
> But even the old 8080 would meet this requirement -- you 
> would use the lhld instruction to get an atomic 16-bit load 
> of a pointer, which suffices for this 8-bit CPU.  The 8080 
> would -not- be able to atomically load a 32-bit quantity -- 
> the memories are quite faded, but I vaguely remember longs 
> being 32 bits in Whitesmiths C.  Nor would it be able to 
> atomically fetch a composite pointer that might be used to 
> bank-switch more than 64K of memory.  However, in the project 
> I was working on, it was the caller's responsibility to make 
> sure that the correct memory was mapped before dereferencing 
> the pointer -- data was segregated by type of structure.  So 
> an atomic 16-bit fetch/store would suffice.  Other projects 
> might well have made different choices.
> 
> So here are the choices I can see:
> 
> 1.	Pure least-common-denominator.  This will likely result in
> 	atomics being restricted to char.  I believe that this would
> 	be unacceptable.
> 
> 2.	The machine must be able to load and store all basic types
> 	atomically, otherwise it cannot claim support of atomics.
> 	Some older machines might need to fetch double-precision
> 	floating point in two pieces -- but then again, there are
> 	machines that simply refuse to support floating point in
> 	the first place.  I have heard rumors about some sort of
> 	new decimal floating point type, but know very little about it.
> 
> 3.	Define different levels of support.  All machines support
> 	atomic char.  All the popular machines I am aware of support
> 	atomic integer types.  All the mainstream machines I am aware
> 	of support atomic binary floating point.  The usual tricks
> 	could be used to support atomic structures that fit into one of
> 	these basic types, e.g., bitfields.  Very few machines support
> 	large atomic aggregates (there are some research prototypes that
> 	support transactional memory, but this does not seem ready for
> 	standardization).
> 
> 	This is the defacto situation today with floating point.
> 	Most machines and environments support it, but things work in
> 	environments that do not.  For example, many operating system
> 	kernels forbid use of floating point in order to optimize
> 	context-switch latency.  Similarly, many embedded CPUs don't
> 	support floating point in any real way (yes, there might be
> 	library functions, but forget about having any memory left over
> 	should you be so foolish as to cause them to be included in
> 	your program).
> 
> 4.	Require that "atomic" be applicable to -all- data structures.
> 	To make this work, the compiler has to know quite a bit about
> 	the environment.  Does it emit cli/sti instructions around
> 	access to an atomic aggregate?	spin_lock()/spin_unlock()?
> 	pthread_mutex_lock()/ pthread_mutex_unlock()?  And so on...
> 
> 	And what constitutes an "access"?  A single load or store
> 	to a field in an atomic structure?  Loading and storing the
> 	aggregate in toto?  A member function in C++?  If the latter,
> 	what about mutually recursive member functions in different C++
> 	structures/classes?
> 
> 	I don't believe that this last is workable.  If the 
> transactional
> 	memory guys are on the right path, then perhaps this can appear
> 	in the future.	However, I do have some questions for those guys
> 	about member functions that do RPCs to other 
> transactional-memory
> 	machines -- and that is assuming that I am willing to cut them
> 	a break on programs accessing MMIO device registers!.
> 
> Any other options?  Refinements of the above options?  At 
> this point, my guess is that #3 is the only workable option.
> 
That's essentially what N2145 does.  It promises to provide atomic_xyz
for various (currently non-fp) builtin types xyz, and then provides a
way to ask whether the available implementation is actually lock-free,
or emulated with locks.  It is understood that the emulated version is
not universally useful, but I think that for portable user level code,
it's often sufficient to make the code work.  The generic atomic<T>
behaves the same way, though it's more likely to be emulated with locks
on a particular platform.

Hans



More information about the cpp-threads mailing list