[cpp-threads] Slightly revised memory model proposal (D2300)
Paul E. McKenney
paulmck at linux.vnet.ibm.com
Wed Jun 13 00:26:54 BST 2007
On Tue, Jun 12, 2007 at 09:35:29AM +0100, Nick Maclaren wrote:
> "Paul E. McKenney" <paulmck at linux.vnet.ibm.com> wrote:
> >
> > Eek!!! In my mind, permitting stores to "flicker" into existence is
> > a cure that is far, far worse than any disease I can imagine. Such
> > "flickering" would break quite a bit of code.
>
> I am not sure that it would break much that isn't already broken,
> but I am dead sure that it would make it impossible for even an
> expert to work out whether or not such code was broken!
>
> I agree. That's curing chilblains by infecting the patient with
> malaria. Please, no!
>
> > How about instead -slightly- weakening the ordering requirements,
> > so that ordered reads can disagree about the order of unordered stores
> > to different variables?
>
> Fine. But let's see a proper, semi-mathematical specification.
> I can see that being viable, and I can see it being ambiguous.
Looks to me like Hans's latest revision to N2171 (AKA D2300) covers
this. Analyzing Sarita's example (de-PPC-ized, as before):
T1: [1] X=2
T2: [2] r2=X(2) [2a] acquire_fence() [3] r3=Y(0)
T3: [4] Y=1 [4a] release_fence() [5] X=1
T4: [6] r1=X(1) [6a] acquire_fence() [7] r2=X(2)
There are no synchronizes-with relations, since no read gives Y=1.
Happens-before relations:
[2] -> [3]
[4] -> [5]
[6] -> [7]
Precedes relationships:
[5] -p> [6]
[1] -p> [2]
[2] -p> [7]
The "precedes" graph is then:
[1] -> [2] -> [3]
|
V
[4] -> [5] -> [6] -> [7]
No cycles, so consistent.
The thing is that T1's observation of X==2 followed by Y==0 does not imply
that the assignment X=2 came before the assignment Y=1. This situation
is similar to IRIW, and so I believe we should accept it as a consequence
of weak ordering. If one wanted this outcome to be excluded, then one
could simply use sequentially-consistent primitives.
In other words, if one read happens-before another read, those two
reads will see a pair of happens-before-related writes in order.
However, the converse is -not- true: just because a pair of ordered
reads see unrelated writes in a given order does -not- imply that
any other pair of ordered reads will necessarily agree on the
ordering of that pair of unrelated writes.
Fair enough?
> Personally, I think that there is a straight choice between making
> the abstract machine serial order a requirement, using only the
> actual store dependencies, and not specifying ANY dependency
> between different locations.
>
> The last approach would be that store_relaxed would be totally
> unordered until there was a store-release; each receiving thread
> would see stores from each transmitting thread in a determined
> order, but that order might be different for every pair of
> transmitting and receiving threads.
If I understand this last correctly, I believe that it is reasonable.
> Yes, it's a hard line, but it's simple. store_relaxed would now
> mean "I am setting up data for a later atomic export - look at it
> before it is exported at your peril." But even that wouldn't allow
> flicker between a SINGLE pair of threads!
Yep. There is always store_ordered (or store_seq_cst or whatever)
if one desires heavier-weight guarantees.
The only thing I see missing from D2300 is the behavior WRT atomic
operations. Here is some possibilities:
1. No ordering, even between successive operations to the same
variable.
2. "Precedes" ordering between the store portion of one operation
and the load portion of the succeeding operation to a given
variable, but not between the load and store portion of any
single operation. No ordering between operations on different
variables.
3. "Precedes" ordering between a pair of operations to the same
variable, and also with a "precedes" relationship between
the load and store portions of a given operation. No ordering
between operations on different variables.
4. Synchronizes-with ordering between a _acquire() operation and
a following _release() operation to the same variable, possibly
combined with one of the other options.
5. Happens-before ordering between any pair of operations to the
same variable. No ordering between operations on different
variables.
6. Happens-before ordering between any pair of operations, even
if to different variables (sequential consistency).
I believe that fetch_add_relaxed() belongs at #2 above.
#1 does not make sense, as even a simple load satisfied by a given
simple store implies a "precedes" relationship.
#3 prohibits some weak implementations involving shared
store buffers or caches where a given atomic
operation is applied to a store-buffer entry.
Stronger options are even more restrictive to
underlying implementations.
For fetch_add_acquire() and fetch_add_release(), #3 combined with #4.
#2 is pointless -- there needs to be a memory barrier anyway,
so it might as well enforce the ordering of the
atomic operation's load and store while it is at it.
(Anyone know of a machine where this causes trouble?)
#5 might be OK, but not clear that it helps the programmer.
For fetch_add_seq_cst(), #6.
Goes without saying, one hopes.
Combinations of different modes would use the weaker of the pair.
Note that a fetch_add_seq_cst() would synchronize with both a prior
fetch_add_release() and a subsequent fetch_add_acquire() to the same
variable.
Other atomic operations should have the same semantics as fetch_add_*().
Seem reasonable?
Thanx, Paul
More information about the cpp-threads
mailing list