[cpp-threads] Alternatives to SC (prohibiting data races, etc.)
Paul E. McKenney
paulmck at linux.vnet.ibm.com
Wed Jan 31 17:54:33 GMT 2007
On Tue, Jan 30, 2007 at 04:16:26PM -0600, Boehm, Hans wrote:
> > From: Paul E. McKenney [mailto:paulmck at linux.vnet.ibm.com]
> >
> > If load_raw() can be used as follows:
> >
> > r1 = load_raw(x);
> > if (r1 < 2) {
> > ...
> > switch(r1) {
> > case 0:
> > case 1: ...
> > }
> > }
> >
> > and prevent x from being reloaded without issuing extra locks
> > or memory barriers, I -think- that it does at least part of
> > what it needs to. It is essentially making the compiler
> > forget where it got r1 from, correct?
>
> Right. And that was always the intent. With load_raw the wild branch is
> no longer possible. And I think we've imposed considerably fewer
> compiler constraints than barrier(). And the code remains annotated
> with a warning that there is an "intentional race" here (though we no
> longer call it a data race, since it involves atomics).
OK -- but I didn't see load_raw() in N2145. This is something you
are proposing adding -- or did I get the wrong document?
If I understand it correctly, your load_raw() would allow devlopers
to implement per-thread counters, which is good. However, it would
not suffice for rcu_dereference() on platforms like Alpha. Adding
the explicit memory barrier required for Alpha penalizes other platforms.
One possibility would be load_raw_dep() or some such that causes the
compiler to "forget" where the argument came from (same as load_raw()),
and also to issue an rmb() (LoadLoad) on platforms such as Alpha that
to not respect ordering of dependent loads.
Does this seem reasonable?
> > > > Yep, Linux mandates that hardware atomically load and
> > store properly
> > > > aligned ints and pointers.
> > >
> > > And for Linux that perfectly reasonable. For us, I suspect
> > it isn't,
> > > though I don't have great data.
> >
> > Well, a good part of this discussion has indeed circled
> > around exactly what requirements we should impose on the
> > underlying hardware. ;-)
> >
> > But even the old 8080 would meet this requirement -- you
> > would use the lhld instruction to get an atomic 16-bit load
> > of a pointer, which suffices for this 8-bit CPU. The 8080
> > would -not- be able to atomically load a 32-bit quantity --
> > the memories are quite faded, but I vaguely remember longs
> > being 32 bits in Whitesmiths C. Nor would it be able to
> > atomically fetch a composite pointer that might be used to
> > bank-switch more than 64K of memory. However, in the project
> > I was working on, it was the caller's responsibility to make
> > sure that the correct memory was mapped before dereferencing
> > the pointer -- data was segregated by type of structure. So
> > an atomic 16-bit fetch/store would suffice. Other projects
> > might well have made different choices.
> >
> > So here are the choices I can see:
> >
> > 1. Pure least-common-denominator. This will likely result in
> > atomics being restricted to char. I believe that this would
> > be unacceptable.
> >
> > 2. The machine must be able to load and store all basic types
> > atomically, otherwise it cannot claim support of atomics.
> > Some older machines might need to fetch double-precision
> > floating point in two pieces -- but then again, there are
> > machines that simply refuse to support floating point in
> > the first place. I have heard rumors about some sort of
> > new decimal floating point type, but know very little about it.
> >
> > 3. Define different levels of support. All machines support
> > atomic char. All the popular machines I am aware of support
> > atomic integer types. All the mainstream machines I am aware
> > of support atomic binary floating point. The usual tricks
> > could be used to support atomic structures that fit into one of
> > these basic types, e.g., bitfields. Very few machines support
> > large atomic aggregates (there are some research prototypes that
> > support transactional memory, but this does not seem ready for
> > standardization).
> >
> > This is the defacto situation today with floating point.
> > Most machines and environments support it, but things work in
> > environments that do not. For example, many operating system
> > kernels forbid use of floating point in order to optimize
> > context-switch latency. Similarly, many embedded CPUs don't
> > support floating point in any real way (yes, there might be
> > library functions, but forget about having any memory left over
> > should you be so foolish as to cause them to be included in
> > your program).
> >
> > 4. Require that "atomic" be applicable to -all- data structures.
> > To make this work, the compiler has to know quite a bit about
> > the environment. Does it emit cli/sti instructions around
> > access to an atomic aggregate? spin_lock()/spin_unlock()?
> > pthread_mutex_lock()/ pthread_mutex_unlock()? And so on...
> >
> > And what constitutes an "access"? A single load or store
> > to a field in an atomic structure? Loading and storing the
> > aggregate in toto? A member function in C++? If the latter,
> > what about mutually recursive member functions in different C++
> > structures/classes?
> >
> > I don't believe that this last is workable. If the
> > transactional
> > memory guys are on the right path, then perhaps this can appear
> > in the future. However, I do have some questions for those guys
> > about member functions that do RPCs to other
> > transactional-memory
> > machines -- and that is assuming that I am willing to cut them
> > a break on programs accessing MMIO device registers!.
> >
> > Any other options? Refinements of the above options? At
> > this point, my guess is that #3 is the only workable option.
> >
> That's essentially what N2145 does. It promises to provide atomic_xyz
> for various (currently non-fp) builtin types xyz, and then provides a
> way to ask whether the available implementation is actually lock-free,
> or emulated with locks. It is understood that the emulated version is
> not universally useful, but I think that for portable user level code,
> it's often sufficient to make the code work. The generic atomic<T>
> behaves the same way, though it's more likely to be emulated with locks
> on a particular platform.
So atomic<char>::lock_free() would normally (always?) return true, while
atomic<struct foo>::lock_free() would normally return false, right?
Thanx, Paul
More information about the cpp-threads
mailing list