[cpp-threads] Slightly revised memory model proposal (D2300)

Wed Jun 13 00:26:54 BST 2007

On Tue, Jun 12, 2007 at 09:35:29AM +0100, Nick Maclaren wrote:
> "Paul E. McKenney" <paulmck at linux.vnet.ibm.com> wrote:
> >
> > Eek!!!  In my mind, permitting stores to "flicker" into existence is
> > a cure that is far, far worse than any disease I can imagine.  Such
> > "flickering" would break quite a bit of code.
> 
> I am not sure that it would break much that isn't already broken,
> but I am dead sure that it would make it impossible for even an
> expert to work out whether or not such code was broken!
> 
> I agree.  That's curing chilblains by infecting the patient with
> malaria.  Please, no!
> 
> > How about instead -slightly- weakening the ordering requirements,
> > so that ordered reads can disagree about the order of unordered stores
> > to different variables?
> 
> Fine.  But let's see a proper, semi-mathematical specification.
> I can see that being viable, and I can see it being ambiguous.

Looks to me like Hans's latest revision to N2171 (AKA D2300) covers
this.  Analyzing Sarita's example (de-PPC-ized, as before):

        T1: [1] X=2
        T2: [2] r2=X(2) [2a] acquire_fence() [3] r3=Y(0)
        T3: [4] Y=1 [4a] release_fence()  [5] X=1
        T4: [6] r1=X(1) [6a] acquire_fence() [7] r2=X(2)

There are no synchronizes-with relations, since no read gives Y=1.

Happens-before relations:

	[2] -> [3]
	[4] -> [5]
	[6] -> [7]

Precedes relationships:

	[5] -p> [6]
	[1] -p> [2]
	[2] -p> [7]

The "precedes" graph is then:

                      [1] -> [2] -> [3]
                              |
                              V
        [4] -> [5] -> [6] -> [7]

No cycles, so consistent.

The thing is that T1's observation of X==2 followed by Y==0 does not imply
that the assignment X=2 came before the assignment Y=1.  This situation
is similar to IRIW, and so I believe we should accept it as a consequence
of weak ordering.  If one wanted this outcome to be excluded, then one
could simply use sequentially-consistent primitives.

In other words, if one read happens-before another read, those two
reads will see a pair of happens-before-related writes in order.
However, the converse is -not- true: just because a pair of ordered
reads see unrelated writes in a given order does -not- imply that
any other pair of ordered reads will necessarily agree on the
ordering of that pair of unrelated writes.

Fair enough?

> Personally, I think that there is a straight choice between making
> the abstract machine serial order a requirement, using only the
> actual store dependencies, and not specifying ANY dependency
> between different locations.
> 
> The last approach would be that store_relaxed would be totally
> unordered until there was a store-release; each receiving thread
> would see stores from each transmitting thread in a determined
> order, but that order might be different for every pair of
> transmitting and receiving threads.

If I understand this last correctly, I believe that it is reasonable.

> Yes, it's a hard line, but it's simple.  store_relaxed would now
> mean "I am setting up data for a later atomic export - look at it
> before it is exported at your peril."  But even that wouldn't allow
> flicker between a SINGLE pair of threads!

Yep.  There is always store_ordered (or store_seq_cst or whatever)
if one desires heavier-weight guarantees.

The only thing I see missing from D2300 is the behavior WRT atomic
operations.  Here is some possibilities:

1.	No ordering, even between successive operations to the same
	variable.

2.	"Precedes" ordering between the store portion of one operation
	and the load portion of the succeeding operation to a given
	variable, but not between the load and store portion of any
	single operation.  No ordering between operations on different
	variables.

3.	"Precedes" ordering between a pair of operations to the same
	variable, and also with a "precedes" relationship between
	the load and store portions of a given operation.  No ordering
	between operations on different variables.

4.	Synchronizes-with ordering between a _acquire() operation and
	a following _release() operation to the same variable, possibly
	combined with one of the other options.

5.	Happens-before ordering between any pair of operations to the
	same variable.  No ordering between operations on different
	variables.

6.	Happens-before ordering between any pair of operations, even
	if to different variables (sequential consistency).

I believe that fetch_add_relaxed() belongs at #2 above.

	#1 does not make sense, as even a simple load satisfied by a given
		simple store implies a "precedes" relationship.

	#3 prohibits some weak implementations involving shared
		store buffers or caches where a given atomic
		operation is applied to a store-buffer entry.
		Stronger options are even more restrictive to
		underlying implementations.

For fetch_add_acquire() and fetch_add_release(), #3 combined with #4.

	#2 is pointless -- there needs to be a memory barrier anyway,
		so it might as well enforce the ordering of the
		atomic operation's load and store while it is at it.
		(Anyone know of a machine where this causes trouble?)

	#5 might be OK, but not clear that it helps the programmer.

For fetch_add_seq_cst(), #6.

	Goes without saying, one hopes.

Combinations of different modes would use the weaker of the pair.
Note that a fetch_add_seq_cst() would synchronize with both a prior
fetch_add_release() and a subsequent fetch_add_acquire() to the same
variable.

Other atomic operations should have the same semantics as fetch_add_*().

Seem reasonable?

						Thanx, Paul