[cpp-threads] Intra-thread synchronizes-with

Sat Nov 27 03:28:38 GMT 2010

The draft standard specifically allows atomic stores and loads have a
synchronizes-with (and in the recently proposed D3196 update,
dependency-ordered-before) relation only if they are in different
threads (1.10.p7 and 1.10.p9).

This gives rise to the observation that under some circumstances,
moving operations into separate threads can _increase_ the
restrictions on memory ordering, which is rather non-intuitive.
Here's an example:

	atomic_int a, b, *p;
	atomic_pointer c;
	int x, y;

	a = 0;
	b = 0;

	Thread 0		Thread 1
	--------		--------
	a.store(2, mo_relaxed);
	b.store(1, mo_release);
				x = b.load(mo_consume);
				c.store(&a, mo_release);
				p = c.load(mo_acquire);
				y = *p;
				assert(x!=1 || p!=&a || y==2);

The standard allows this assertion to fail: The c.load does not
synchronize with the c.store because they occur in the same thread,
and consequently the a.store need not happen before the assignment to
y; the b.load does not carry a dependency to y's assignment.

A compiler-oriented explanation might run like this: The standard
allows the compiler to optimize away the c.load expression, along with
its attendant memory ordering properties, and simply assign &a
directly to p.  There is then nothing to prevent the compiler from
moving the assignments to p and y up before the b.load: Statements may
be moved up before a release operation, and there is no data
dependency from b to either p or y.

But now consider what happens if some of the statements are moved into
a third thread:

	Thread 0		Thread 1	Thread 2
	--------		--------	--------
	a.store(2, mo_relaxed);
	b.store(1, mo_release);
				x = b.load(mo_consume);
				c.store(&a, mo_release);
						p = c.load(mo_acquire);
						y = *p;
						assert(x!=1 || p!=&a || y==2);

Now the assertion _is_ guaranteed to hold.  If x==1 and p==&a then:

	a.store		is sequenced before
	b.store		is dependency ordered before
	b.load		is sequenced before
	c.store		is synchronized with
	c.load		is sequenced before
	y = *p

Therefore a.store happens before the dereference of p, so y must end
up equal to 2.

A similar compiler-oriented explanation might say that here the b.load
cannot be optimized away, so the hardware effects of its
memory-ordering properties must take place, which justifies the
assertion.

Still, is this actually the intended behavior?  Normally one expects
that moving code into a new thread makes it _less_ ordered with
respect to other events, not _better_ ordered.

In the two-thread program, what if c had been declared volatile?
Then the c.load could not be optimized away or moved before the
c.store, so the assertion _would_ always hold even though the standard
doesn't guarantee it.  Again, is this intended?

Would it be better to change the standard to state that a load-release
synchronizes with a store-acquire if the two operations take place in
different threads or if the store-acquire is applied to a volatile
object (and analogously for store-consume)?

Alan Stern