[cpp-threads] C++ memory model

Tue Sep 22 21:18:35 BST 2009

On Sep 22 2009, Boehm, Hans wrote:
>> 
>4) Pushing the envelope a bit when it comes to implementations.
>
> I don't think there is a risk-free solution. And I'd argue that the last 
> one has the least potential for long-term damage, at least if it's done 
> after careful study. Admittedly in the Fortran case, timing may have 
> precluded the careful study.

I agree with the principle.  But it WASN'T timing, so much as Fortran
is targetting a different mixture of architectures to C++.

In particular, my understanding is that C++ is targetting primarily
the 'cache coherent shared memory' architectures of today.  While few
of them deliver sequential consistency at the hardware level, they do
deliver 'causal consistency' and you can build sequential consistency
from that without TOO much loss of scalability.

However, distributed systems (i.e. ones using message passing) do not
guarantee even causal consistency UNLESS you also force synchronicity.
And that is NOT good news for scalability - asynchronicity rules, just
as it did 40 years back, in the days when we got phenomenal I/O speeds
out of slow hardware.

>> Consider the following (reasonable) scenario.  There are N 
>> threads and N atomic variables, each of which is accessed 
>> more-or-less randomly by a random collection of M threads in 
>> any interval of time T, but by all threads over longer 
>> periods.  How can that be implemented scalably in, say, the 
>> case where M = T = sqrt(N)?
>
> Again, I don't see a fundamental issue, but I'm not an expert on the 
> appropriate protocols. Fundamentally, you need to ensure that every 
> reader of an atomic sees at least the memory updates that were available 
> to the writer, and that you ensure a total order among the atomic 
> operations themselves. I think those can all be done with communication 
> among the M active threads. The effective cost of the individual atomic 
> operations will no doubt increase with M, and hence this won't scale 
> perfectly, but usable alternatives seem to have at least similar issues.

We are agreed there.

> The one expert in distributed implementations whom I did ask seemed to be 
> largely in agreement with me.
>
>We agree that none of this is a proof ...

I regard myself as a second-rank expert (i.e. not up with Hoare, Lamport
etc., but following on), and I don't know of any such algorithm.  I have
spent quite a lot of time trying to invent one, and have failed, though
I have invented a few other useful and new parallel algorithms.

If you could pass on my scenario to your expert and ask him to send me
a reference to or draft of a scalable algorithm, I should be grateful.

Regards,
Nick Maclaren.