<div class="gmail_quote">On Thu, Oct 7, 2010 at 3:00 PM,  <span dir="ltr">&lt;<a href="mailto:cpp-threads-request@decadent.org.uk">cpp-threads-request@decadent.org.uk</a>&gt;</span> wrote:<br><blockquote class="gmail_quote" style="margin: 0pt 0pt 0pt 0.8ex; border-left: 1px solid rgb(204, 204, 204); padding-left: 1ex;">


<br>

  Lock-free programming usually makes heavy use of dependent loads as an alternative to load w/ acquire.  For platforms without dependent load, Paul McKenney came up with a technique that delays storing of a pointer to an object that has just been initialized until after all processors have executed an acquire memory barrier or the equivalent.  The technique is basically RCU with the memory barriers as quiesce points and with some inter processor signaling to speed things up.<br>


<br>

It might be useful to move RCU wait from the write phase, where it slows things down, to the heap phase.   Just keep freed memory on the heap and don&#39;t reallocate it until all other processors no longer have any references to it.  Then when you allocate memory for an object, you can initialize it and without delay, subsequently globally publish it without requiring other processors to do a load w/ acquire to properly see the initial state of the object.<br>


<br>

There some caveats.  You&#39;d have to be careful with architectures that allow speculative reads.  You probably couldn&#39;t allow in use memory to share cache lines with unallocated heap memory.<br>

<br>

Systems like Java that have GC get most of this for free by the nature of GC.  You wouldn&#39;t have to do too much more to make it work.   I&#39;ve seen some interest there for the ability to create immutable objects and publish them without having to use volatile.<br>


<font color="#888888"><br>

</font></blockquote></div><br>Hi Joseph,<br><br>I don&#39;t get your idea. There still must be a hardware-level quiescence period (my term for your &quot;RCU with the memory barriers&quot;) between object initialization and publication. So how can a thread initialize an object and without delay publish it?<br>


<br>I saw your recent post on Java Concurrency interest list related to unsafe publication. But Java does not guarantee &quot;costless safe unsafe publication&quot;, it guarantees only that other threads will see either &quot;zeroes&quot; or properly initialized fields. And that can be achieved with RCU by: deref -&gt; synchronize_rcu -&gt; zeroize -&gt; synchronize_rcu -&gt; reuse. But such guarantee is rather useless in C/C++ IMVHO (however perhaps it&#39;s useful for &quot;preserving basic safety&quot; inside of JVM). Anyway, as far as I see, here you are talking about something different. So what I am missing?<br>


<br>By the way, I think, virtually zero overhead publication can be achieved with original RCU proposition, i.e. deferring algorithm steps rather than just deferring memory reclamation:<br>allocate -&gt; initialize -&gt; call_rcu(hardware_level) -&gt; publish -&gt; call_rcu(application_level) -&gt; free old object<br>


<br>TIA<br>-- <br>Dmitriy V&#39;jukov<br><br>Relacy Race Detector: Make your synchronization correct!<br><a href="http://groups.google.ru/group/relacy">http://groups.google.ru/group/relacy</a><br>