[cpp-threads] Alternatives to SC

Sat Jan 13 01:53:07 GMT 2007

On Fri, Jan 12, 2007 at 01:39:38PM -0500, Doug Lea wrote:
> Paul E. McKenney wrote:
> >
> >
> >Anyway, given your clarification, the overall effect seems to be:
> >
> >	For every read r of x, there is some write to x that produces
> >	the value read, and the read r occurs in a unique place within
> >	the totally-ordered series of writes to x.
> 
> Yes. Note that two reads at the same position are not ordered
> wrt each other though.

Got it, thank you!

> >>>>[Causal Propagation]
> >>>> e --> e'' if there is an e' in E(p(e)) u E(p(e'')) st e --> e' --> e'' 
> >>>> .
> >>>
> >OK.  I am a bit confused by this
> 
> Here's the intuition. Suppose you have:
>    T1                T2
>    e; wx{x=1}        rx{r1=x}
> Where e is any other read or write, and rx sees the write wx.
> in which case
>    e  --> wx  (program order)
>    wx --> r   (freshness)
> Now, e and wx are in T1, rx is in T2.
> So we can apply causal propagation to get e --> rx.
> 
> And similarly if we had:
>    T1                T2
>    wx{x=1}         rx{r1=x}; e
> We'd get wx --> e.
> 
> And further, if we had
>    T1                T2                  T3
>    e; wx{x=1}       rx{r1=x}; wy(y=1}    ry{r2=y}
> we'd eventually get e --> ry

(and I understand the "eventually" from your next message)

> But the rule doesn't allow arbitrary inference-at-a-distance
> propagation through third parties. That's the "causal" aspect
> of this rule.

So, does the following formalism describe this?

	e --> e'' if exists(e') in E(p(e)) u E(p(e'')) st
			e --> e' --> e'' and
			(p(e) == p(e') or p(e') == p(e''))

Or is the restriction on the processors implied in some other way that
I missed?

Ah!!!  I finally see it -- The fact that e' is selected from the
executions of the (up to) two processors executing e and e'' means
that e' cannot possibly be executing on a third processor.  So my
restriction after the "and" above is indeed redundant.

So this formalism supports what I have been calling "pairwise memory
barriers" -- a pair of CPUs can reliably communicate ordered changes
with memory barriers (or with volatiles, atomics, or whatever for which
the compiler supplies the needed memory barriers).

Trying another example:

	T1		T2		T3
	[1] x = 1	[3] y++		[4] if (y == 2)
	[2] y = 1			[5]	assert(x == 1)

If we reach the assertion at [5], we know:
	[2] --> [3] --> [4] by Write Serialization
	[1] --> [2] and [4] --> [5] by Program Order
Applying Causal Propagation with e=[1], e'=[2], and e''=[4], we have
	[1] --> [4].  This is permitted because [1] and [2] are both
	executed by T1 and [4] is executed by T3.
Applying Causal Propagation with e=[1], e'=[4], and e''=[5], we have
	[1] --> [5].  This is permitted because [1] is executed by T1
	and [4] and [5] are both executed by T3.

So the set G is the transitive closure of:

	{[1] --> [2] --> [3] --> [4] --> [5]}

The set W is {[1], [2], [3]}.

Can G legally contain [5] --> [1]?  No, it cannot, because if it did:

	G | W u E(T3)

would contain a cycle.  Therefore, the assertion, if reached, cannot fail.

Or am I missing something here?  (I believe that something like this
is in fact needed if the formalism is to be able to implement locking.)

> Hope this helps!

Well, let's see how I do on the examples.  ;-)

> EXAMPLES:
> 
> 1. Dekker (forbidden outcome)
> 
> Given
> 
> T1             T2
> [1] x = 1      [3] y = 1
> [2] r1 = y(0)  [4] r2 = x(0)
> 
> Program Order gives
>    [1] --> [2], [3] --> [4],
> Write Serialization and Freshness give
>    [2] --> [3], [4] --> [1].
> Propagation adds
>    [3] --> [1], [1] --> [3] (and some other edges).
> 
> This graph is not an execution because (G | {[1],[3],[4]})* has a
> cycle ([1] --> [3] --> [1])

This is using T2:

	(G | W u E(T2))*

expands to:

	(G | {[1],[3]} u {[3],[4]})*

which is the (G | {[1],[3],[4]})* above.

> ======
> 
> 2. IRIW (Independent reads of independent write -- allowed outcome)
>     Note that this is not SC.
> 
> 
> T1          T2         T3                T4
> 
> [1] x = 1   [2] y = 1
> 
>                        [3] r1 = x (1)    [5] r3 = y (1)
>                        [4] r2 = y (0)    [6] r4 = x (0)
> 
> Program Order, Write Serialization, and Freshness give:
>    [1] --> [3] --> [4] --> [2]
>    [2] --> [5] --> [6] --> [1]
> Propagation adds
>    [2] --> [6], [5] --> [1] and [1] --> [4], [3] --> [2].
> 
> This graph satisfies all these conditions. (Note that [1] --> [2] is
> in (G | W u E(T2))* and [2] --> [1] is in (G | W u E(T1)*), but both
> these graphs are separately acyclic.)

The trick here is that neither T3 nor T4 write to any volatiles,
and the only cycles in G involve both T3 and T4.  So projecting
onto any one task will mask away some portion of the cycle.

In contrast, in my earlier example, both T1 and T2 had -only- writes,
and an assertion failure produced a cycle involving T1, T2, and T3.
T3 was able to see all operations, and so the cycle was visible to T3,
and thus not permitted.

> =======
> 
> 3. CC (Causal consistency -- forbidden outcome)
> 
> T1           T2               T3
> [1] x = 1
>              [2] r1 = x (1)   [4] r2 = y (1)
>              [3] y = 1        [5] r3 = x (0)
> 
> Program Order, Write Serialization, and Freshness give:
>   [1] --> [2] --> [3] --> [4] --> [5] --> [1].
> This is not an execution because
>   G | E u E(T3) = G | {[1],[3],[4],[5]}*

The above line needs to be:

    G | W u E(T3) = G | {[1],[3],[4],[5]}*

Right?

> has cycle [1]-->[3]-->[4] --> [5] --> [1].

So we retain dependencies among writes, even if an intervening
read on the non-focus task was needed to establish that dependency?
So even though [2] is dropped from the local-consistency check,
the [1] --> [3] dependency that it established is retained, due to
the fact that this dependency has effect on a pair of writes.  In
contrast, the dependencies in example 2 in T3 and T4 were only
on reads, and thus not retained.

Another way of putting this (I think) is that a read can establish
a dependency, but that dependency matters only if that read either:

1.	was executed by the focus task (T3 in this case), or

2.	force a dependency involving a write event that was
	executed on the same processor as was the read.

Is this correct?

> =======
> 
> 4. DC (Another causal consistency example -- forbidden outcome)
> 
> T1           T2               T3
> [1] y = 1
>              [2] x = 1        [4] r3 = y (1)
>              [3] r2 = y (0)   [5] r4 = x (0)
> 
> Program Order, Write Serialization, and Freshness give:
>    [1] -> [4] --> [5] --> [2] --> [3] --> [1]
> Propagation adds
>    [2] --> [1] (among others).
> This is not an execution because
>    G | W u E(T3) = G | {[1],[2],[4],[5]}
> has the cycle [1] --> [4] --> [5] --> [2] --> [1].

The upside-down version of example 3.

						Thanx, Paul