[cpp-threads] Alternative to dependent load

Thu Oct 7 13:04:30 BST 2010

On Thu, Oct 7, 2010 at 3:00 PM, <cpp-threads-request at decadent.org.uk> wrote:

>
>  Lock-free programming usually makes heavy use of dependent loads as an
> alternative to load w/ acquire.  For platforms without dependent load, Paul
> McKenney came up with a technique that delays storing of a pointer to an
> object that has just been initialized until after all processors have
> executed an acquire memory barrier or the equivalent.  The technique is
> basically RCU with the memory barriers as quiesce points and with some inter
> processor signaling to speed things up.
>
> It might be useful to move RCU wait from the write phase, where it slows
> things down, to the heap phase.   Just keep freed memory on the heap and
> don't reallocate it until all other processors no longer have any references
> to it.  Then when you allocate memory for an object, you can initialize it
> and without delay, subsequently globally publish it without requiring other
> processors to do a load w/ acquire to properly see the initial state of the
> object.
>
> There some caveats.  You'd have to be careful with architectures that allow
> speculative reads.  You probably couldn't allow in use memory to share cache
> lines with unallocated heap memory.
>
> Systems like Java that have GC get most of this for free by the nature of
> GC.  You wouldn't have to do too much more to make it work.   I've seen some
> interest there for the ability to create immutable objects and publish them
> without having to use volatile.
>
>
Hi Joseph,

I don't get your idea. There still must be a hardware-level quiescence
period (my term for your "RCU with the memory barriers") between object
initialization and publication. So how can a thread initialize an object and
without delay publish it?

I saw your recent post on Java Concurrency interest list related to unsafe
publication. But Java does not guarantee "costless safe unsafe publication",
it guarantees only that other threads will see either "zeroes" or properly
initialized fields. And that can be achieved with RCU by: deref ->
synchronize_rcu -> zeroize -> synchronize_rcu -> reuse. But such guarantee
is rather useless in C/C++ IMVHO (however perhaps it's useful for
"preserving basic safety" inside of JVM). Anyway, as far as I see, here you
are talking about something different. So what I am missing?

By the way, I think, virtually zero overhead publication can be achieved
with original RCU proposition, i.e. deferring algorithm steps rather than
just deferring memory reclamation:
allocate -> initialize -> call_rcu(hardware_level) -> publish ->
call_rcu(application_level) -> free old object

TIA
-- 
Dmitriy V'jukov

Relacy Race Detector: Make your synchronization correct!
http://groups.google.ru/group/relacy
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.decadent.org.uk/pipermail/cpp-threads/attachments/20101007/c5a42b0a/attachment.htm>