[cpp-threads] Alternatives to SC
Joseph Seigh
jseigh_cp00 at xemaps.com
Sun Jan 14 16:47:15 GMT 2007
Peter Dimov wrote:
>
> Doug Lea wrote:
>
>> Atomics can be used to create SC constructions like locks.
>> But atomics can also be used in other ways that are not necessarily
>> SC. As it turns out, all usages on cache-coherent causally consistent
>> machines are at least nearly SC.
>
>
> Isn't it true that most usages are PC? The one exception that I know
> of is hazard pointers. Do we really need the #StoreLoads? :-)
>
> Speaking of hazard pointers... as an aside, the two main problems with
> turning an academic paper into a real C++ implementations are that (1)
> most papers assume SC when giving the pseudocode, (2) most papers
> either assume garbage collection, or don't mention the issue of memory
> management at all. Even given SC atomics, the second problem remains.
> Yet the discussion doesn't mention it at all. Do the proponents of "SC
> atomics for the masses" also assume that they will only be usable in a
> GC environment? :-) I simply don't see how a non-expert can reasonably
> handle the memory management of a typical lock-free container, no
> matter how sequentially consistent the atomics are.
>
Hazard pointers are a form of PDR which is a term I've been trying to
coin. It stands for PCOW
(Partial Copy On Write) Deferred Reclamation to refer to the memory
management schemes
necessary for lock-free algorithms, though some will argue that
lock-free itself is a debased
concurrency.
But there are multiple levels of usage. One is the implementation of
the PDR scheme. The next level
would be the implementation of a lock-free data structure using PDR.
And the next one after that
would be the using the api for that lock-free data structure if the
application programmers are up
to that. Some programmers aren't even up to using a lock based data
structure so you could argue
as many do that we shouldn't have multi-threading at all.
There are lock-free containers already in public use, e.g
java.util.atomic.ConcurrentLinkedList in Java.
Somebody had to implement a thread-safe PDR, i.e GC, and someone else
has to implement the
ConcurrentLinkedList class. And presumably there are at least some
programmers who can deal
with the concurrency semantics of this container.
Speaking of hazard pointers, you can do them without that #StoreLoad
membar and the release
membar required for the old copied over value of the hazard pointer if
it's anything other than
the null object which you shouldn't be trying to access anyway even if
the null object is never
deallocated. I was going to bring this up on the subject of composable
atomics, i.e. whether
you can come up with a set of atomics that will allow you to build any
higher level synchronization
mechanism that you can think of.
This is the gcc x86 hazard pointer load implementation which uses a
modified RCU algorithm
to eliminate the need for memory barriers. It runs effectively at the
same speed as a normal
pointer load, about 10x faster than a version w/ memory barriers.
__attribute__((always_inline))
static __inline__ void * smrload32(void **hptr, void **src) {
void * ret;
__asm__ __volatile__ (
"mov 0(%2), %%ecx ;\n" // load source pointer
"1:\t"
"mov %%ecx, %0 ;\n"
"mov %%ecx, 0(%1) ;\n" // store into hazard pointer
"mov 0(%2), %%ecx ;\n" // reload source pointer
"cmp %%ecx, %0 ;\n"
"jne 1b ;\n"
"mov %%ecx, 4(%1) ;\n" // store into hazard pointer[1]
: "=&r" (ret)
: "r" (hptr), "r" (src)
: "cc", "memory", "ecx"
);
return ret;
}
There's a second internal hazard pointer to make make data structure
traversal
safe, i.e. prevent nodes from being deallocated out from under you, as
it doesn't
allow explicit copying of hazard pointers. It's in that spot because
that's the
most efficient place to put it as it's still in a register at that point.
The point here is the hardware is allowed to reorder memory accesses.
In fact
you want it to do it to improve performance. But you don't want the
compiler to
reorder anything across the hazard pointer load logic. I suppose you
could do
it with unordered atomic ops with compiler pragmas to prevent the compiler
from reordering things but it's not clear pragmas like that will be in
the atomics
api. Can you create an atomics api that will anticipate all possible
usages of it?
If you can create an atomics api that will anticipate all possible
future hardware
memory models, it shouldn't be that hard I guess.
--
Joe Seigh
More information about the cpp-threads
mailing list