[cpp-threads] cpp-threads Digest, Vol 41, Issue 11

Tue Dec 7 17:17:36 GMT 2010

On Tue, 7 Dec 2010, Boehm, Hans wrote:

> > From: Mark Batty
> > Sent: Monday, December 06, 2010 5:46 AM
> > To: cpp-threads at decadent.org.uk
> > Subject: Re: [cpp-threads] cpp-threads Digest, Vol 41, Issue 11
> > 
> > > But now consider this hybrid version:
> > >
> > >        Thread 0                        Thread 1
> > >        --------                        --------
> > >        a.store(1, mo_seq_cst);         b.store(1, mo_relaxed);
> > >                                        fence(mo_seq_cst);
> > >        b.store(2, mo_seq_cst);         a.store(2, mo_relaxed);
> > >
> > > Here the standard allows the assertion to fail.  Why is this?  Is it
> > > deliberate or is it simply an oversight?  If it is deliberate, what
> > is
> > > the justification?
> > >
> > > Alan Stern
> > 
> > I don't understand how seq_cst fences are intended to be used, but I'd
> > like to find out. I checked each of your three examples against our
> > computerised mathematical version of the C++0x memory model, and each
> > behaves as you predicted.
> > 
> > Mark
> > 
> I'm inclined to classify this as a low priority bug.  I can't think of any implementation-based motivations for it, though I may be overlooking something.
> 
> Fences were added late in the game, for basically two reasons:
> 
> 1) To allow simple existing fence-based code to be moved forward with minimal effort.
> 
> 2) To support a few relatively rare cases in which fence-based code could actually provide noticeably better performance on existing hardware.
> 
> In the large majority of cases, I would prefer to discourage people from writing new fence-based code, especially if it relies on subtle properties of those fences.  Such code is error-prone, and it overconstrains the implementation by enforcing lots of unnecessary ordering constraints.
> 
> Fences in the current standard were a compromise to provide the most essential fence functionality without introducing too much complexity into the specification, and without introducing significant implementation problems, especially in light of the fact that hardware fences don't all have the same semantics.  We particularly want to avoid slowing down the rest of the atomics to support fences.  There is no claim that fences are specified as strongly as they possibly could be.
> 
> Nonetheless, if someone can suggest improvements that don't appreciably complicate the specification, I think most of us would be very interested.  I agree that this particular behavior is weird and unexpected.  I'm not sure how important it is in practice.

Here is a suggestion.  In 29.3, between paragraphs 5 and 6 add:

	For an atomic operation B that modifies an atomic object M,
	if there is a memory_order_seq_cst fence X sequenced before B,
	then B occurs later in M's modification order than any 
	memory_order_seq_cst modification of M preceding X in the total 
	order S.

	For atomic operations A and B on an atomic object M, if there 
	is a memory_order_seq_cst fence X such that A is sequenced 
	before X and B follows X in S, then B occurs later than A in 
	the modification order of M.

These two new paragraphs are the write-write analogs of 29.3.p3 and 
29.3.p4, just as 29.3.p6 is the write-write analog of 29.3.p5.  Neither 
of these can be derived from the current text, and together they give 
the desired result for the hybrid example above.

I don't know what implications they will have for implementations.

Alan Stern