[cpp-threads] Alternatives to SC
Paul E. McKenney
paulmck at linux.vnet.ibm.com
Fri Jan 12 03:36:23 GMT 2007
On Thu, Jan 11, 2007 at 08:01:27AM -0500, Doug Lea wrote:
> [I've been meaning to post this for a month or so, but have been
> diverted doing other things...]
[ snipping the preamble, which has seen some sprited discussion...]
> This models two properties that not only does everyone think are
> necessary, but also seem to be universally supported. It would be
> nice if the hardware specs would come out and explicitly say so.
>
> Cache coherence in turn has two parts:
> * Globally totally ordered writes
> * Fresh reads (never read a value that has been overritten)
> As does causal consistency:
> * Causality:
> If p2 sees p1's write, and p3 sees p2's write, p3 sees p1's write
Violating this does indeed result in great pain, even for this
hard-core concurrent programmer!!!
> * Local acyclicity: Each thread sees an acyclic order of events that
> is consistent with program order.
This property holds only for specially marked variables, right?
Otherwise, it seems to me to outlaw most optimizations, both in the
compiler and in the CPU.
> (Since starting on this, I've found that this model has been around in
> various guises for over 20 years, but apparently never fleshed out enough
> to serve as a basis for language-level memory models.)
I have been groping in this direction as well, and am very happy to
see someone produce it!!!
> The current best formal version of model is appended below.
> (Thanks mainly to Vijay Saraswat for help with this.)
> Like any memory model, it needs a bunch of notation to get across.
> But the main thing to note is that it matches SC for all cases except
> those involving independent reads of independent writes (IRIW --
> see the derivation of the most famous example of this below.)
This case is indeed the most painful for hardware, from what I
understand.
> If nothing else, I hope this focusses further discussion:
>
> Should sequential consistency be mandated for independent
> reads of independent writes?
>
> My view is no -- that CCCC would be a fine basis for defining strong
> volatiles/atomics. While I do appreciate the view
> that SC is simpler to express and has a better chance of being
> understood by programmers, constructions involving IRIW don't have any
> reasonable sequential analogs anyway, so there's no SC intuition that
> needs preserving. And as a working concurrent programmer, I think that
> the extra bit of concurrent readability CCCC allows hardware to
> exploit is a feature, not a bug. And I worry that if language-level
> models do require SC, then compliance checking is going to be
> problematic. It's not clear whether some processors can support it
> without resorting to awful things like using CAS or LL/SC for each
> load. And if processors are required to do so, then programmers will
> routinely find some cheaper workaround hacks that are unlikely to be as
> well-defined or correct as plain CCCC versions would be.
Agreed. For example, if one is accumulating a running sum of measurements
of concurrent events, there is absolutely no point in forcing an
artificial serialization on those measurements.
> (Note also that any program using locks (that are in turn created
> using CCCC-compliant primitives) is SC. It's only volatiles/atomics that
> allow concurrent independent reads.)
Agreed -- locking is a higher-level abstraction in the CCCC context.
> **** CCCC model snapshot ****
>
> NOTATION:
>
> G = (S,-->) is a set of nodes S and a binary (directed) relation --> on
> the nodes. Nodes are of type:
> e ranges over read and write events,
> r over read events and
> w over write events.
So an atomic RMW event would carry all three types, right?
> G is a partial order if --> is transitive and irreflexive.
> G is a total order if --> is a partial order and for any two
> distinct nodes a and b in S, either a --> b or b --> a.
> G | T is graph G restricted to subset T; that is, the graph with nodes
> in T and with a --> b in T iff a --> b in S.
> G* is the transitive closure of graph G.
> 'u' is set union
>
> p(e) is the processor which executed event e,
> E(p) is the set of events executed by processor p.
> W(x) is the set of all write events on variable x
> W is the set of all write events (across all variables)
> w(S) is set S together with all elements of W with an edge into or out of S.
In other words, w(S) is the set of events S plus all writes that deliver
values directly to a read event in S plus all writes that overwrite
values produced by write events in S.
Or did I miss a turn here?
> RULES:
>
> An execution is a graph G=(S, -->) over a set of events S satisfying:
>
> [Write Serialization]
> For each variable x, G | W(x) is a total order, with minimal
> element w0 setting x to 0.
>
> [Freshness]
> For every read r of x, min(W(x)) --> r and G | (W(x) u {r}) is a
> total order.
OK... "min(W(x))" is the write that produced the value consumed by r?
I was expecting something saying that a program-ordered pair of reads
to the same variable would not see that variable's values playing
backwards, but don't see how to obtain that interpretation from the
above formula. What am I missing? Is this piece to be provided by
the second part of "Local Consistency" below?
> [Causal Propagation]
> e --> e'' if there is an e' in E(p(e)) u E(p(e'')) st e --> e' --> e'' .
In other words, the existence of:
e' in E(p(e)) u E(p(e'')) st e --> e' --> e'' .
implies that:
e --> e''
Or am I missing something?
Also, p(e), p(e'), and p(e'') might well all be different CPUs, right?
> [Local Consistency]
> For all processors i, (G | w(E(i))* is a partial order (i.e., is acyclic)
> Additionally, volatiles obey local program order:
> For all processors p, G | E(p) is a total order.
All the other relations apply to non-volatiles as well as to volatiles?
Not sure I would agree with that.
Also, if I understand correctly, what you are calling "volatiles" is
what some of the proposals in this forum have called "atomic" or
"interlocked" or maybe also "atomic_flag".
> EXAMPLES:
All variables in these examples are "volatiles"?
Thanx, Paul
> 1. Dekker (forbidden outcome)
>
> Given
>
> T1 T2
> [1] x = 1 [3] y = 1
> [2] r1 = y(0) [4] r2 = x(0)
>
> Program Order gives
> [1] --> [2], [3] --> [4],
> Write Serialization and Freshness give
> [2] --> [3], [4] --> [1].
> Propagation adds
> [3] --> [1], [1] --> [3] (and some other edges).
>
> This graph is not an execution because (G | {[1],[3],[4]})* has a
> cycle ([1] --> [3] --> [1])
>
> ======
>
> 2. IRIW (Independent reads of independent write -- allowed outcome)
> Note that this is not SC.
>
>
> T1 T2 T3 T4
>
> [1] x = 1 [2] y = 1
>
> [3] r1 = x (1) [5] r3 = y (1)
> [4] r2 = y (0) [6] r4 = x (0)
>
> Program Order, Write Serialization, and Freshness give:
> [1] --> [3] --> [4] --> [2]
> [2] --> [5] --> [6] --> [1]
> Propagation adds
> [2] --> [6], [5] --> [1] and [1] --> [4], [3] --> [2].
>
> This graph satisfies all these conditions. (Note that [1] --> [2] is
> in (G | W u E(T2))* and [2] --> [1] is in (G | W u E(T1)*), but both
> these graphs are separately acyclic.)
>
> =======
>
> 3. CC (Causal consistency -- forbidden outcome)
>
> T1 T2 T3
> [1] x = 1
> [2] r1 = x (1) [4] r2 = y (1)
> [3] y = 1 [5] r3 = x (0)
>
> Program Order, Write Serialization, and Freshness give:
> [1] --> [2] --> [3] --> [4] --> [5] --> [1].
> This is not an execution because
> G | E u E(T3) = G | {[1],[3],[4],[5]}*
> has cycle [1]-->[3]-->[4] --> [5] --> [1].
>
> =======
>
> 4. DC (Another causal consistency example -- forbidden outcome)
>
> T1 T2 T3
> [1] y = 1
> [2] x = 1 [4] r3 = y (1)
> [3] r2 = y (0) [5] r4 = x (0)
>
> Program Order, Write Serialization, and Freshness give:
> [1] -> [4] --> [5] --> [2] --> [3] --> [1]
> Propagation adds
> [2] --> [1] (among others).
> This is not an execution because
> G | W u E(T3) = G | {[1],[2],[4],[5]}
> has the cycle [1] --> [4] --> [5] --> [2] --> [1].
>
>
>
>
>
> --
> cpp-threads mailing list
> cpp-threads at decadentplace.org.uk
> http://www.decadentplace.org.uk/cgi-bin/mailman/listinfo/cpp-threads
More information about the cpp-threads
mailing list