[cpp-threads] Yet another visibility question

Tue Nov 21 18:25:14 GMT 2006

On 11/20/06, Hans Boehm <Hans.Boehm at hp.com> wrote:
> I think that even the official N2052 proposal, the proposed 1.10p7
> is fairly clear.  The thread 3 load_acquire reads only one of the values
> written to z by threads 1 and 2.  Hence there is a synchronizes with
> relationship between only one of threads 1 and 2 and thread 3.
> Hence one of the checks in the assertion may fail.

Okay, let's assume that.

> I don't really agree with Lawrence that this is unusable.  The rules
> are fairly clear, and we all agree that non-wizards should use
> fetchadd_full here anyway.

I fail to see the use of fetch_add_release if it does not release its writes
to all subsequent readers.  If the behavior of a third thread can prevent
two threads from communicating, we have a problem.  In that case,
one must rely on other synchronization, and we could as well have used
a totally unsynchronized fetchadd.

> It seems to be slightly harder to specify the version in which all the
> edges are introduced.  For fetch_add, there is clearly a total order among
> all the operations on z.  In general, we currently do not assume a total
> order on stores at this level, so the notion of "all previous" is not
> well-defined.  But I think that's fixable, especially if someone can come
> up with a strong reason for doing so.

In this case, no total order is required because the (non-atomic) writes are
to different variables.  (That is, there is no race.)

> (I'm a little uncomfortable with the absence of a total order on stores to
> a single atomic variable, but I haven't convinced myself that it really
> matters.  We do allow, for raw atomics:
>
> Thread 1: x = 1; x = 2;
> Thread 2: r1 = x; r2 = x; r3 = x;
>
> r1 = r3 = 1 and r2 = 2
>
> but since the compiler can reorder the loads in thread 2, I don't think
> that's astonishing.  And weakly ordered atomics are weird and dangerous.)
>
> Currently I think it's a tradeoff between simplicity of the specification
> and reference counting performance.  I think current hardware gives you
> the stronger semantics anyway, though a software DSM might benefit from
> the weaker version.
>
> Java does give you the stronger semantics, but Java volatiles have
> stronger ordering properties anyway.
>
> Hans
>
> On Mon, 20 Nov 2006, Lawrence Crowl wrote:
>
> > On 11/17/06, Peter Dimov <pdimov at mmltd.net> wrote:
> > > Let me re-ask the same question again using slightly different wording. I'm
> > > really not sure of the correct answer. "Correct" as in both "follows from
> > > the memory model", and "what we want".
> > >
> > > // x y z initially zero
> > >
> > > // thread 1
> > >
> > > x = 1;
> > > fetchadd_release( &z, +1 );
> > >
> > > // thread 2
> > >
> > > y = 1;
> > > fetchadd_release( &z, +1 );
> > >
> > > // thread 3
> > >
> > > if( load_acquire( &z ) == 2 )
> > > {
> > >     assert( x == 1 && y == 1 );
> > > }
> > >
> > > Will the assert pass?
> > >
> > > In other words, does the load-acquire from z in thread 3 introduce
> > > sync-with edges to all previous store-releases to z, or just to the
> > > last one?
> >
> > I'd need to look more carefully to be sure, but I think all edges.
> >
> > In any case, I think introducing only the last edge would produce
> > a programming model that is effectively unusable.
> >
> > --
> > Lawrence Crowl
> >
> > --
> > cpp-threads mailing list
> > cpp-threads at decadentplace.org.uk
> > http://www.decadentplace.org.uk/cgi-bin/mailman/listinfo/cpp-threads
> >
>
> --
> cpp-threads mailing list
> cpp-threads at decadentplace.org.uk
> http://www.decadentplace.org.uk/cgi-bin/mailman/listinfo/cpp-threads
>

-- 
Lawrence Crowl