[cpp-threads] cpp-threads Digest, Vol 41, Issue 11

Mon Dec 6 13:46:29 GMT 2010

> But now consider this hybrid version:
>
>        Thread 0                        Thread 1
>        --------                        --------
>        a.store(1, mo_seq_cst);         b.store(1, mo_relaxed);
>                                        fence(mo_seq_cst);
>        b.store(2, mo_seq_cst);         a.store(2, mo_relaxed);
>
> Here the standard allows the assertion to fail.  Why is this?  Is it
> deliberate or is it simply an oversight?  If it is deliberate, what is
> the justification?
>
> Alan Stern

I don't understand how seq_cst fences are intended to be used, but I'd
like to find out. I checked each of your three examples against our
computerised mathematical version of the C++0x memory model, and each
behaves as you predicted.

Mark

On Sun, Nov 28, 2010 at 12:00 PM,  <cpp-threads-request at decadent.org.uk> wrote:
> Send cpp-threads mailing list submissions to
>        cpp-threads at decadent.org.uk
>
> To subscribe or unsubscribe via the World Wide Web, visit
>        http://www.decadent.org.uk/cgi-bin/mailman/listinfo/cpp-threads
> or, via email, send a message with subject or body 'help' to
>        cpp-threads-request at decadent.org.uk
>
> You can reach the person managing the list at
>        cpp-threads-owner at decadent.org.uk
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of cpp-threads digest..."
>
>
> Today's Topics:
>
>   1. Sequentially consistent atomic memory fences (Alan Stern)
>   2. Intra-thread synchronizes-with (Alan Stern)
>
>
> ----------------------------------------------------------------------
>
> Message: 1
> Date: Fri, 26 Nov 2010 22:23:47 -0500 (EST)
> From: Alan Stern <stern at rowland.harvard.edu>
> Subject: [cpp-threads] Sequentially consistent atomic memory fences
> To: cpp-threads list <cpp-threads at decadent.org.uk>
> Message-ID:
>        <Pine.LNX.4.44L0.1011262222380.9856-100000 at saphir.localdomain>
> Content-Type: TEXT/PLAIN; charset=US-ASCII
>
> Is there anything missing from 29.3.p3-6?  The specification of the
> properties of sequentially consistent fences appears to be incomplete.
>
> Consider the following schematic example:
>
>        atomic_int a, b;
>
>        a.store(0, mo_seq_cst);
>        b.store(0, mo_seq_cst);
>
>        Thread 0                        Thread 1
>        --------                        --------
>        a.store(1, mo_seq_cst);         b.store(1, mo_seq_cst);
>        b.store(2, mo_seq_cst);         a.store(2, mo_seq_cst);
>
>        join threads 0 and 1
>        assert(a==2 || b==2);
>
> The draft standard allows one to prove that the assertion will never
> fail.  Similarly for the analogous program where the thread routines
> are changed to use fences:
>
>        Thread 0                        Thread 1
>        --------                        --------
>        a.store(1, mo_relaxed);         b.store(1, mo_relaxed);
>        fence(mo_seq_cst);              fence(mo_seq_cst);
>        b.store(2, mo_relaxed);         a.store(2, mo_relaxed);
>
> Again, the standard is strong enough to show that the assertion will
> always hold.  But now consider this hybrid version:
>
>        Thread 0                        Thread 1
>        --------                        --------
>        a.store(1, mo_seq_cst);         b.store(1, mo_relaxed);
>                                        fence(mo_seq_cst);
>        b.store(2, mo_seq_cst);         a.store(2, mo_relaxed);
>
> Here the standard allows the assertion to fail.  Why is this?  Is it
> deliberate or is it simply an oversight?  If it is deliberate, what is
> the justification?
>
> Alan Stern
>
>
>
>
> ------------------------------
>
> Message: 2
> Date: Fri, 26 Nov 2010 22:28:38 -0500 (EST)
> From: Alan Stern <stern at rowland.harvard.edu>
> Subject: [cpp-threads] Intra-thread synchronizes-with
> To: cpp-threads list <cpp-threads at decadent.org.uk>
> Message-ID:
>        <Pine.LNX.4.44L0.1011262223490.9856-100000 at saphir.localdomain>
> Content-Type: TEXT/PLAIN; charset=US-ASCII
>
> The draft standard specifically allows atomic stores and loads have a
> synchronizes-with (and in the recently proposed D3196 update,
> dependency-ordered-before) relation only if they are in different
> threads (1.10.p7 and 1.10.p9).
>
> This gives rise to the observation that under some circumstances,
> moving operations into separate threads can _increase_ the
> restrictions on memory ordering, which is rather non-intuitive.
> Here's an example:
>
>        atomic_int a, b, *p;
>        atomic_pointer c;
>        int x, y;
>
>        a = 0;
>        b = 0;
>
>        Thread 0                Thread 1
>        --------                --------
>        a.store(2, mo_relaxed);
>        b.store(1, mo_release);
>                                x = b.load(mo_consume);
>                                c.store(&a, mo_release);
>                                p = c.load(mo_acquire);
>                                y = *p;
>                                assert(x!=1 || p!=&a || y==2);
>
> The standard allows this assertion to fail: The c.load does not
> synchronize with the c.store because they occur in the same thread,
> and consequently the a.store need not happen before the assignment to
> y; the b.load does not carry a dependency to y's assignment.
>
> A compiler-oriented explanation might run like this: The standard
> allows the compiler to optimize away the c.load expression, along with
> its attendant memory ordering properties, and simply assign &a
> directly to p.  There is then nothing to prevent the compiler from
> moving the assignments to p and y up before the b.load: Statements may
> be moved up before a release operation, and there is no data
> dependency from b to either p or y.
>
> But now consider what happens if some of the statements are moved into
> a third thread:
>
>        Thread 0                Thread 1        Thread 2
>        --------                --------        --------
>        a.store(2, mo_relaxed);
>        b.store(1, mo_release);
>                                x = b.load(mo_consume);
>                                c.store(&a, mo_release);
>                                                p = c.load(mo_acquire);
>                                                y = *p;
>                                                assert(x!=1 || p!=&a || y==2);
>
> Now the assertion _is_ guaranteed to hold.  If x==1 and p==&a then:
>
>        a.store         is sequenced before
>        b.store         is dependency ordered before
>        b.load          is sequenced before
>        c.store         is synchronized with
>        c.load          is sequenced before
>        y = *p
>
> Therefore a.store happens before the dereference of p, so y must end
> up equal to 2.
>
> A similar compiler-oriented explanation might say that here the b.load
> cannot be optimized away, so the hardware effects of its
> memory-ordering properties must take place, which justifies the
> assertion.
>
> Still, is this actually the intended behavior?  Normally one expects
> that moving code into a new thread makes it _less_ ordered with
> respect to other events, not _better_ ordered.
>
> In the two-thread program, what if c had been declared volatile?
> Then the c.load could not be optimized away or moved before the
> c.store, so the assertion _would_ always hold even though the standard
> doesn't guarantee it.  Again, is this intended?
>
> Would it be better to change the standard to state that a load-release
> synchronizes with a store-acquire if the two operations take place in
> different threads or if the store-acquire is applied to a volatile
> object (and analogously for store-consume)?
>
> Alan Stern
>
>
>
>
> ------------------------------
>
> --
> cpp-threads mailing list
> cpp-threads at decadent.org.uk
> http://www.decadent.org.uk/cgi-bin/mailman/listinfo/cpp-threads
>
>
> End of cpp-threads Digest, Vol 41, Issue 11
> *******************************************
>