[cpp-threads] Alternatives to SC

Fri Jan 12 03:36:23 GMT 2007

On Thu, Jan 11, 2007 at 08:01:27AM -0500, Doug Lea wrote:
> [I've been meaning to post this for a month or so, but have been
> diverted doing other things...]

[ snipping the preamble, which has seen some sprited discussion...]

> This models two properties that not only does everyone think are
> necessary, but also seem to be universally supported. It would be
> nice if the hardware specs would come out and explicitly say so.
> 
> Cache coherence  in turn has two parts:
>   * Globally totally ordered writes
>   * Fresh reads (never read a value that has been overritten)
> As does causal consistency:
>   * Causality:
>      If p2 sees p1's write, and p3 sees p2's write, p3 sees p1's write

Violating this does indeed result in great pain, even for this
hard-core concurrent programmer!!!

>   * Local acyclicity: Each thread sees an acyclic order of events that
>      is consistent with program order.

This property holds only for specially marked variables, right?
Otherwise, it seems to me to outlaw most optimizations, both in the
compiler and in the CPU.

> (Since starting on this, I've found that this model has been around in
> various guises for over 20 years, but apparently never fleshed out enough
> to serve as a basis for language-level memory models.)

I have been groping in this direction as well, and am very happy to
see someone produce it!!!

> The current best formal version of model is appended below.
> (Thanks mainly to Vijay Saraswat for help with this.)
> Like any memory model, it needs a bunch of notation to get across.
> But the main thing to note is that it matches SC for all cases except
> those involving independent reads of independent writes (IRIW --
> see the derivation of the most famous example of this below.)

This case is indeed the most painful for hardware, from what I
understand.

> If nothing else, I hope this focusses further discussion:
> 
> Should sequential consistency be mandated for independent
> reads of independent writes?
> 
> My view is no -- that CCCC would be a fine basis for defining strong
> volatiles/atomics. While I do appreciate the view
> that SC is simpler to express and has a better chance of being
> understood by programmers, constructions involving IRIW don't have any
> reasonable sequential analogs anyway, so there's no SC intuition that
> needs preserving. And as a working concurrent programmer, I think that
> the extra bit of concurrent readability CCCC allows hardware to
> exploit is a feature, not a bug.  And I worry that if language-level
> models do require SC, then compliance checking is going to be
> problematic. It's not clear whether some processors can support it
> without resorting to awful things like using CAS or LL/SC for each
> load. And if processors are required to do so, then programmers will
> routinely find some cheaper workaround hacks that are unlikely to be as
> well-defined or correct as plain CCCC versions would be.

Agreed.  For example, if one is accumulating a running sum of measurements
of concurrent events, there is absolutely no point in forcing an
artificial serialization on those measurements.

> (Note also that any program using locks (that are in turn created
> using CCCC-compliant primitives) is SC. It's only volatiles/atomics that
> allow concurrent independent reads.)

Agreed -- locking is a higher-level abstraction in the CCCC context.

> **** CCCC model snapshot ****
> 
> NOTATION:
> 
> G = (S,-->) is a set of nodes S and a binary (directed) relation --> on
>    the nodes. Nodes are of type:
>      e ranges over read and write events,
>      r over read events and
>      w over write events.

So an atomic RMW event would carry all three types, right?

> G is a partial order if --> is transitive and irreflexive.
> G is a total order if --> is a partial order and for any two
>      distinct nodes a and b in S, either a --> b or b --> a.
> G | T is graph G restricted to subset T; that is, the graph with nodes
>      in T and with a --> b in T iff a --> b in S.
> G* is the transitive closure of graph G.
> 'u' is set union
> 
> p(e) is the processor which executed event e,
> E(p) is the set of events executed by processor p.
> W(x) is the set of all write events on variable x
> W    is the set of all write events (across all variables)
> w(S) is set S together with all elements of W with an edge into or out of S.

In other words, w(S) is the set of events S plus all writes that deliver
values directly to a read event in S plus all writes that overwrite
values produced by write events in S.

Or did I miss a turn here?

> RULES:
> 
> An execution is a graph G=(S, -->) over a set of events S satisfying:
> 
> [Write Serialization]
>   For each variable x, G | W(x) is a total order, with minimal
>   element w0 setting x to 0.
> 
> [Freshness]
>   For every read r of x, min(W(x)) --> r and G | (W(x) u {r}) is a
>   total order.

OK...  "min(W(x))" is the write that produced the value consumed by r?

I was expecting something saying that a program-ordered pair of reads
to the same variable would not see that variable's values playing
backwards, but don't see how to obtain that interpretation from the
above formula.  What am I missing?  Is this piece to be provided by
the second part of "Local Consistency" below?

> [Causal Propagation]
>   e --> e'' if there is an e' in E(p(e)) u E(p(e'')) st e --> e' --> e'' .

In other words, the existence of:

	e' in E(p(e)) u E(p(e'')) st e --> e' --> e'' .

implies that:

	e --> e''

Or am I missing something?

Also, p(e), p(e'), and p(e'') might well all be different CPUs, right?

> [Local Consistency]
>   For all processors i, (G | w(E(i))* is a partial order (i.e., is acyclic)
>   Additionally, volatiles obey local program order:
>     For all processors p, G | E(p) is a total order.

All the other relations apply to non-volatiles as well as to volatiles?
Not sure I would agree with that.

Also, if I understand correctly, what you are calling "volatiles" is
what some of the proposals in this forum have called "atomic" or
"interlocked" or maybe also "atomic_flag".

> EXAMPLES:

All variables in these examples are "volatiles"?

							Thanx, Paul

> 1. Dekker (forbidden outcome)
> 
> Given
> 
> T1             T2
> [1] x = 1      [3] y = 1
> [2] r1 = y(0)  [4] r2 = x(0)
> 
> Program Order gives
>   [1] --> [2], [3] --> [4],
> Write Serialization and Freshness give
>   [2] --> [3], [4] --> [1].
> Propagation adds
>   [3] --> [1], [1] --> [3] (and some other edges).
> 
> This graph is not an execution because (G | {[1],[3],[4]})* has a
> cycle ([1] --> [3] --> [1])
> 
> ======
> 
> 2. IRIW (Independent reads of independent write -- allowed outcome)
>    Note that this is not SC.
> 
> 
> T1          T2         T3                T4
> 
> [1] x = 1   [2] y = 1
> 
>                       [3] r1 = x (1)    [5] r3 = y (1)
>                       [4] r2 = y (0)    [6] r4 = x (0)
> 
> Program Order, Write Serialization, and Freshness give:
>   [1] --> [3] --> [4] --> [2]
>   [2] --> [5] --> [6] --> [1]
> Propagation adds
>   [2] --> [6], [5] --> [1] and [1] --> [4], [3] --> [2].
> 
> This graph satisfies all these conditions. (Note that [1] --> [2] is
> in (G | W u E(T2))* and [2] --> [1] is in (G | W u E(T1)*), but both
> these graphs are separately acyclic.)
> 
> =======
> 
> 3. CC (Causal consistency -- forbidden outcome)
> 
> T1           T2               T3
> [1] x = 1
>             [2] r1 = x (1)   [4] r2 = y (1)
>             [3] y = 1        [5] r3 = x (0)
> 
> Program Order, Write Serialization, and Freshness give:
>  [1] --> [2] --> [3] --> [4] --> [5] --> [1].
> This is not an execution because
>  G | E u E(T3) = G | {[1],[3],[4],[5]}*
> has cycle [1]-->[3]-->[4] --> [5] --> [1].
> 
> =======
> 
> 4. DC (Another causal consistency example -- forbidden outcome)
> 
> T1           T2               T3
> [1] y = 1
>             [2] x = 1        [4] r3 = y (1)
>             [3] r2 = y (0)   [5] r4 = x (0)
> 
> Program Order, Write Serialization, and Freshness give:
>   [1] -> [4] --> [5] --> [2] --> [3] --> [1]
> Propagation adds
>   [2] --> [1] (among others).
> This is not an execution because
>   G | W u E(T3) = G | {[1],[2],[4],[5]}
> has the cycle [1] --> [4] --> [5] --> [2] --> [1].
> 
> 
> 
> 
> 
> --
> cpp-threads mailing list
> cpp-threads at decadentplace.org.uk
> http://www.decadentplace.org.uk/cgi-bin/mailman/listinfo/cpp-threads