[cpp-threads] Slightly revised memory model proposal (D2300)

Wed Jun 20 22:34:06 BST 2007

Here's the promised revision of D2300.  I think this is a far more
conventional, and hence hopefully safer, formulation.

It includes some fairly major changes, and hence clearly requires more
scrutiny.  But we might as well be scrutinizing it in parallel.

I do believe that it

a) Allows both outcomes for Sarita's example, and hence hopefully does
not overconstrain the PowerPC implementation.

b) Prohibits the sort of flickering we have been discussing, at the
expense of some (hopefuly insignificant) compiler optimizations.

C) May actually be a bit simpler than the original, since we lost
another named relation.

It also resurrects the infinite loop issue.  Unfortunately, I've become
increasingly convinced that we cannot, in good conscience, dodge it.
And I think it belongs in this proposal.

Hans

> -----Original Message-----
> From: cpp-threads-bounces at decadentplace.org.uk 
> [mailto:cpp-threads-bounces at decadentplace.org.uk] On Behalf 
> Of Paul E. McKenney
> Sent: Tuesday, June 19, 2007 7:05 PM
> To: C++ threads standardisation
> Subject: Re: [cpp-threads] Slightly revised memory model 
> proposal (D2300)
> 
> On Wed, Jun 20, 2007 at 12:35:09AM -0000, Boehm, Hans wrote:
> > Thinking about this yet again, I think there may be another 
> way out of 
> > this, by outlawing the flickering in all cases, even with relaxed 
> > operations on both sides.  That also avoids the synchronization 
> > elimination issues.
> >  
> > This violates my earlier argument that
> >  
> > p = &x;
> > q = &x;
> > ...
> > while (...) {
> >    // Loop does not store to potentially shared variables.
> >   r1 = load_relaxed(p); // loads *p, i.e. x
> >   r2 = load_relaxed(q); // loads *q, i.e. x }
> >  
> > should allow load_relaxed operations to be reordered.
> >  
> > Effectively we would be prohibiting compiler reordering of atomic 
> > operations, if the affected objects potentially aliased, 
> even if both 
> > operations were loads.  Since the reordering is prevented 
> only if the 
> > same location is affected, and cache coherence guarantees 
> that at the 
> > hardware level anyway, I think this is purely a compiler reordering 
> > restriction.  The only case in which I could think of this 
> mattering 
> > are for some numerical methods that are oblivious to races 
> on floating 
> > point data, and hence use lots of atomic operations.  But I suspect 
> > the compiler's alias analysis is typically quite good in 
> those cases.
> >  
> > Opinions?
> 
> At first glance, this looks extremely good to me.  But before 
> passing judgement, I should make sure that I understand what 
> you are proposing.
> (Perish the thought!!!)
> 
> First, in the following case:
> 
>   r1 = load_relaxed(&x);
>   r2 = load_relaxed(&y);
> 
> the compiler still would be permitted to reorder the two 
> load_relaxed() statements, correct?
> 
> Second, if there are store_relaxed() statements in the mix, 
> but still to distinct variables:
> 
>   r1 = load_relaxed(&x);
>   r2 = load_relaxed(&y);
>   store_relaxed(&z, 3);
> 
> would the compiler be permitted to emit code for these 
> statements in any desired permutation?  (Presumably "yes", 
> given that the CPU would be permitted to reorder them anyway.)
> 
> Programmers communicating with signal handlers or interrupt 
> handlers would want a compiler barrier, of course.
> 
> 						Thanx, Paul
> 
> > Hans
> >  
> > 
> >  
> > 
> > ________________________________
> > 
> > 	From: cpp-threads-bounces at decadentplace.org.uk
> > [mailto:cpp-threads-bounces at decadentplace.org.uk] On Behalf 
> Of Boehm, 
> > Hans
> > 	Sent: Tuesday, June 19, 2007 4:21 PM
> > 	To: C++ threads standardisation
> > 	Subject: RE: [cpp-threads] Slightly revised memory 
> model proposal 
> > (D2300)
> > 	
> > 	
> > 
> > ________________________________
> > 
> > 		From:  Raul Silvera
> > 		
> > 		Hans wrote on 06/15/2007 12:09:39 PM:
> > 		
> > 		> Unfortunately, I think I posted some 
> misinformation here, with 
> > respect
> > 		> to flickering.  I believe the version of the 
> example that I 
> > posted:
> > 		> 
> > 		> > > Thread 1:
> > 		> > > store_relaxed(&x, 1);
> > 		> > >
> > 		> > > Thread 2:
> > 		> > > store_relaxed(&x, 2);
> > 		> > >
> > 		> > > Thread 3:
> > 		> > > r1 = load_acquire(&x); (1)
> > 		> > > r2 = load_acquire(&x); (2)
> > 		> > > r3 = load_acquire(&x); (1)
> > 		> > >
> > 		> is already allowed to flicker under the D2300 rules.
> > And looking back
> > 		> at Sarita's example, weakening this doesn't 
> seem to help.  (The 
> > example
> > 		> that we should really have been discussing 
> would have had release 
> > stores.
> > 		> That's the one that's currently constrained 
> by the modification 
> > order
> > 		> rule.  And having that flicker does seem dubious.)
> > 		
> > 		I find this very troubling. From T3's point of 
> view, it is just 
> > doing acquire
> > 		operations, and it is not expecting any 
> flickering, regardless of 
> > which stores
> > 		are going to satisfy its loads.  
> > 		 
> > 
> > 	I was almost going to agree with you, and try to change this.
> > But this again runs into synchronization elimination issues, which 
> > seem central here.  If the "acquire"s in thread 3 mean anything 
> > without a matching "release" then, by similar reasons,
> > 	 
> > 	r1 = load_relaxed(&x); r2 = load_relaxed(&x); r3 = 
> load_relaxed(&x);
> > 	 
> > 	can allow different outcomes from
> > 	 
> > 	r1 = load_relaxed(&x); fetch_and_add_acq_rel(&dead1, 0); r2 = 
> > load_relaxed(&x); fetch_and_add_acq_rel(&dead2, 0); r3 = 
> > load_relaxed(&x);
> > 	 
> > 	which means that the dead fetch_and_adds can't be 
> eliminated, which 
> > is very unfortunate.  It also means that I can't ever 
> eliminate locks 
> > after thread inlining without understanding the whole program.
> > 	 
> > 	I'm more and more inclined to do what Sarita was 
> advocating anyway, 
> > which is to switch to a more conventional formulation of the memory 
> > model in which happens-before is
> > 	just the transitive closure of the union of 
> sequenced-before and 
> > synchronizes-with.  That makes it clearer that acquire and release 
> > only provide any guarantees if they occur in pairs.
> > 	 
> > 	(The last proposal has another similar synchronization 
> elimination 
> > issue with the "precedes" relation, which includes 
> happens-before, but 
> > not sequenced-before.  I think we can also get rid of by 
> moving back 
> > to a more conventional, Java-like, happens-before
> > model.)
> > 	 
> > 	My general feeling is that if we have a trade-off between 
> > synchronization elimination and more expressive low-level atomics, 
> > synchronization elimination should win, since it effects lock-based 
> > user code, which is bound to make up a much larger body of 
> code than 
> > low-level atomics clients.
> > 	 
> > 	And although I also find this a bit troubling, I'm 
> still having a lot 
> > of trouble constructing a case in which this matters.
> > 	 
> > 	Hans
> > 
> 
> > --
> > cpp-threads mailing list
> > cpp-threads at decadentplace.org.uk
> > http://www.decadentplace.org.uk/cgi-bin/mailman/listinfo/cpp-threads
> 
> 
> -- 
> cpp-threads mailing list
> cpp-threads at decadentplace.org.uk
> http://www.decadentplace.org.uk/cgi-bin/mailman/listinfo/cpp-threads
> 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.decadentplace.org.uk/pipermail/cpp-threads/attachments/20070620/d176796a/D2300-0001.html