[cpp-threads] Yet another visibility question
Lawrence Crowl
Lawrence at Crowl.org
Tue Nov 21 23:33:49 GMT 2006
I'm not convinced we are talking about the same problem.
On 11/21/06, Boehm, Hans <hans.boehm at hp.com> wrote:
> I still don't fully agree with Lawrence's arguments, but the more I look
> at this, the more I'm inclined to agree with Lawrence's and Peter's
> conclusion. Details below:
>
> > -----Original Message-----
> > From: Lawrence Crowl
> > [Hans:]
> > > I don't really agree with Lawrence that this is unusable.
> > The rules
> > > are fairly clear, and we all agree that non-wizards should use
> > > fetchadd_full here anyway.
> >
> > I fail to see the use of fetch_add_release if it does not
> > release its writes to all subsequent readers. If the
> > behavior of a third thread can prevent two threads from
> > communicating, we have a problem. In that case, one must
> > rely on other synchronization, and we could as well have used
> > a totally unsynchronized fetchadd.
> Since fetch_add both reads and writes a variable, the normal use of the
> different variants is:
>
> fetch_add_full: The default that should normally be used in examples
> such as the one we're discussing here.
>
> fetch_add_acquire, fetch_add_release: Makes sense if fetch_add is used
> to implement lock/unlock operations of some flavor. Each
> fetch_add_release value is read by a fetch_add_acquire, and thus each
> lock holders memory operations become visible to the next lock holder.
> Everything behaves as expected. You can get away without the "full"
> variant because we're in a special case in which the operations occur in
> pairs.
Perhaps the issue is what are the use cases for non-fully-ordered
fetch_add operations. The following seems plausible, but I think it
depends on all release edges being live.
A reference-counted object uses fetch_add_acquire to both log
the new referer and to obtain the object contents.
I'll have to ponder use-cases for fetch_add_release.
> fetch_add_raw: Useful for atomically updating counters that will only be
> read after the threads are joined.
>
> I think these uses are all fairly clear and make sense. We're really
> discussing a (admittedly important) borderline case in which you might
> be able to get away with fetch_add_release, but in my mind,
> fetch_add_full is the natural construct.
If the case is borderline, why should we burden the standard with
the complexity?
>
> In a multithreaded process, a third thread can always mess up
> communication between the other two if it has access to one of their
> shared variables. So that doesn't particularly bother me either.
> >
> > > It seems to be slightly harder to specify the version in
> > which all the
> > > edges are introduced. For fetch_add, there is clearly a
> > total order
> > > among all the operations on z. In general, we currently do
> > not assume
> > > a total order on stores at this level, so the notion of
> > "all previous"
> > > is not well-defined. But I think that's fixable, especially if
> > > someone can come up with a strong reason for doing so.
> >
> > In this case, no total order is required because the
> > (non-atomic) writes are to different variables. (That is,
> > there is no race.)
> There is no race, but you are assuming an order on the updates to the
> atomic z. And for fetch_add, that seems very reasonable. For ordinary
> stores it may well be more reasonable than the current proposal, but it
> is not what it currently says.
I don't think I was depending on the order. To recap, the example is
// x y z initially zero
// thread 1
x = 1; fetch_add_release( &z, +1 );
// thread 2
y = 1; fetch_add_release( &z, +1 );
// thread 3
if( load_acquire( &z ) == 2 ) assert( x == 1 && y == 1 );
For the if condition to be true, both releases must have occurred
and the new values of x and y should be visible. There is no
dependence on their order, only that they both occurred. Am I
missing something?
> The following example may shed light on
> both the difficulty, and why we might want to address it anyway:
>
> Consider (v atomic, x and y ordinary, all assignments to/from v are
> really load_acquire or store_release operations, everything initially
> zero):
>
> [These are really two different example wrapped into one.]
>
> Thread 1:
> x = 1;
> v = 1;
>
> Thread 2:
> y = 1;
> v = 2;
>
> Thread 3:
> r1 = v; (sees 1)
> r2 = v; (sees 2)
>
> Thread 4:
> if (v == 2) assert (x == 1);
>
> If we look at threads 1 through 3, it appears that the assignment of 1
> to v (with a release store) occurs first. Thus we might be tempted to
> believe that a subsequent load_acquire of v as in thread 4 would ensure
> visibility of both the assignments to x and y.
>
> But in the current memory model proposal, this is actually nonsense.
> There is no order between the two assignments to v, and the asssert is
> clearly allowed to fail.
I agree that reading v==2 does not imply that x==1 in this case. At
present, we do not have a total store order. The question is what is
different between the initial case and your alternate case.
>
> Thread 5:
> r4 = v; (sees 2)
> r5 = v; (sees 1)
> r6 = v; (sees 2)
> In fact, the outcomes in thread 5 are allowed,
> which makes it fairly clear that we can conclude nothing from thread 3.
> (The only happens-before edges are within each thread and from the
> stores in threads 1 and 2 to the matching reads. Since there are no
> loads in threads one and two, there is no opportunity for a precedes
> cycle. And clearly no store "happens between" the corresponding stores
> and loads.)
>
> Having said that, I am not at all convinced that we want to allow the
> thread 5 outcomes. In fact, I think I'm less comfortable with the
> thread 5 results here than I am with Peter's original example. As far
> as I know, all current hardware precludes the thread 5 outcomes in the
> absence of compiler reordering, though that would be good to verify.
>
> We could disallow the thread 5 behavior by explicitly requiring that
> stores to each single atomic object x occur in a single total order
> "store-order-for-x", and that all such orders be included in the
> "happens-before" relation. This would guarantee that the assert in
> threads 1 to 4 from the above example always succeed, assuming the
> thread 3 outcomes.
I also am uncomfortable with thread 5 outcomes.
Would outcome 5 be prevented by a weaker constraint that each thread
have a sequentially consistent view of the values, even though those
views may differ from thread to thread?
> This would make it possible to define "previous store-releases", so that
> we could make the modification that Peter and Lawrence want.
I'd still like to hear what the hardware folks have to say about total
store order before committing to that, but I think there is something
intermediate between that and what you are saying about the current
document.
>
> I'd be inclined to do both, even though it does make the spec slightly
> longer and superficially more complicated.
>
> I think there are no compatibility issues with Prism, since that only
> defines much stronger atomics anyway. This effectively makes our weak
> atomics slightly stronger.
>
> Other opinions?
--
Lawrence Crowl
More information about the cpp-threads
mailing list