[cpp-threads] Alternatives to SC (prohibiting data races, etc.)
Boehm, Hans
hans.boehm at hp.com
Tue Jan 30 22:16:26 GMT 2007
> From: Paul E. McKenney [mailto:paulmck at linux.vnet.ibm.com]
>
> If load_raw() can be used as follows:
>
> r1 = load_raw(x);
> if (r1 < 2) {
> ...
> switch(r1) {
> case 0:
> case 1: ...
> }
> }
>
> and prevent x from being reloaded without issuing extra locks
> or memory barriers, I -think- that it does at least part of
> what it needs to. It is essentially making the compiler
> forget where it got r1 from, correct?
Right. And that was always the intent. With load_raw the wild branch is
no longer possible. And I think we've imposed considerably fewer
compiler constraints than barrier(). And the code remains annotated
with a warning that there is an "intentional race" here (though we no
longer call it a data race, since it involves atomics).
>
> > > Yep, Linux mandates that hardware atomically load and
> store properly
> > > aligned ints and pointers.
> >
> > And for Linux that perfectly reasonable. For us, I suspect
> it isn't,
> > though I don't have great data.
>
> Well, a good part of this discussion has indeed circled
> around exactly what requirements we should impose on the
> underlying hardware. ;-)
>
> But even the old 8080 would meet this requirement -- you
> would use the lhld instruction to get an atomic 16-bit load
> of a pointer, which suffices for this 8-bit CPU. The 8080
> would -not- be able to atomically load a 32-bit quantity --
> the memories are quite faded, but I vaguely remember longs
> being 32 bits in Whitesmiths C. Nor would it be able to
> atomically fetch a composite pointer that might be used to
> bank-switch more than 64K of memory. However, in the project
> I was working on, it was the caller's responsibility to make
> sure that the correct memory was mapped before dereferencing
> the pointer -- data was segregated by type of structure. So
> an atomic 16-bit fetch/store would suffice. Other projects
> might well have made different choices.
>
> So here are the choices I can see:
>
> 1. Pure least-common-denominator. This will likely result in
> atomics being restricted to char. I believe that this would
> be unacceptable.
>
> 2. The machine must be able to load and store all basic types
> atomically, otherwise it cannot claim support of atomics.
> Some older machines might need to fetch double-precision
> floating point in two pieces -- but then again, there are
> machines that simply refuse to support floating point in
> the first place. I have heard rumors about some sort of
> new decimal floating point type, but know very little about it.
>
> 3. Define different levels of support. All machines support
> atomic char. All the popular machines I am aware of support
> atomic integer types. All the mainstream machines I am aware
> of support atomic binary floating point. The usual tricks
> could be used to support atomic structures that fit into one of
> these basic types, e.g., bitfields. Very few machines support
> large atomic aggregates (there are some research prototypes that
> support transactional memory, but this does not seem ready for
> standardization).
>
> This is the defacto situation today with floating point.
> Most machines and environments support it, but things work in
> environments that do not. For example, many operating system
> kernels forbid use of floating point in order to optimize
> context-switch latency. Similarly, many embedded CPUs don't
> support floating point in any real way (yes, there might be
> library functions, but forget about having any memory left over
> should you be so foolish as to cause them to be included in
> your program).
>
> 4. Require that "atomic" be applicable to -all- data structures.
> To make this work, the compiler has to know quite a bit about
> the environment. Does it emit cli/sti instructions around
> access to an atomic aggregate? spin_lock()/spin_unlock()?
> pthread_mutex_lock()/ pthread_mutex_unlock()? And so on...
>
> And what constitutes an "access"? A single load or store
> to a field in an atomic structure? Loading and storing the
> aggregate in toto? A member function in C++? If the latter,
> what about mutually recursive member functions in different C++
> structures/classes?
>
> I don't believe that this last is workable. If the
> transactional
> memory guys are on the right path, then perhaps this can appear
> in the future. However, I do have some questions for those guys
> about member functions that do RPCs to other
> transactional-memory
> machines -- and that is assuming that I am willing to cut them
> a break on programs accessing MMIO device registers!.
>
> Any other options? Refinements of the above options? At
> this point, my guess is that #3 is the only workable option.
>
That's essentially what N2145 does. It promises to provide atomic_xyz
for various (currently non-fp) builtin types xyz, and then provides a
way to ask whether the available implementation is actually lock-free,
or emulated with locks. It is understood that the emulated version is
not universally useful, but I think that for portable user level code,
it's often sufficient to make the code work. The generic atomic<T>
behaves the same way, though it's more likely to be emulated with locks
on a particular platform.
Hans
More information about the cpp-threads
mailing list