[cpp-threads] Slightly revised memory model proposal (D2300)

Wed Jun 20 03:05:18 BST 2007

On Wed, Jun 20, 2007 at 12:35:09AM -0000, Boehm, Hans wrote:
> Thinking about this yet again, I think there may be another way out of
> this, by outlawing the flickering in all cases, even with relaxed
> operations on both sides.  That also avoids the synchronization
> elimination issues.
>  
> This violates my earlier argument that
>  
> p = &x;
> q = &x;
> ...
> while (...) {
>    // Loop does not store to potentially shared variables.
>   r1 = load_relaxed(p); // loads *p, i.e. x
>   r2 = load_relaxed(q); // loads *q, i.e. x }
>  
> should allow load_relaxed operations to be reordered.
>  
> Effectively we would be prohibiting compiler reordering of atomic
> operations, if the affected objects potentially aliased, even if both
> operations were loads.  Since the reordering is prevented only if the
> same location is affected, and cache coherence guarantees that at the
> hardware level anyway, I think this is purely a compiler reordering
> restriction.  The only case in which I could think of this mattering are
> for some numerical methods that are oblivious to races on floating point
> data, and hence use lots of atomic operations.  But I suspect the
> compiler's alias analysis is typically quite good in those cases.
>  
> Opinions?

At first glance, this looks extremely good to me.  But before passing
judgement, I should make sure that I understand what you are proposing.
(Perish the thought!!!)

First, in the following case:

  r1 = load_relaxed(&x);
  r2 = load_relaxed(&y);

the compiler still would be permitted to reorder the two load_relaxed()
statements, correct?

Second, if there are store_relaxed() statements in the mix, but still
to distinct variables:

  r1 = load_relaxed(&x);
  r2 = load_relaxed(&y);
  store_relaxed(&z, 3);

would the compiler be permitted to emit code for these statements in
any desired permutation?  (Presumably "yes", given that the CPU would
be permitted to reorder them anyway.)

Programmers communicating with signal handlers or interrupt handlers
would want a compiler barrier, of course.

						Thanx, Paul

> Hans 
>  
> 
>  
> 
> ________________________________
> 
> 	From: cpp-threads-bounces at decadentplace.org.uk
> [mailto:cpp-threads-bounces at decadentplace.org.uk] On Behalf Of Boehm,
> Hans
> 	Sent: Tuesday, June 19, 2007 4:21 PM
> 	To: C++ threads standardisation
> 	Subject: RE: [cpp-threads] Slightly revised memory model
> proposal (D2300)
> 	
> 	
> 
> ________________________________
> 
> 		From:  Raul Silvera
> 		
> 		Hans wrote on 06/15/2007 12:09:39 PM:
> 		
> 		> Unfortunately, I think I posted some misinformation
> here, with respect
> 		> to flickering.  I believe the version of the example
> that I posted:
> 		> 
> 		> > > Thread 1:
> 		> > > store_relaxed(&x, 1);
> 		> > >
> 		> > > Thread 2:
> 		> > > store_relaxed(&x, 2);
> 		> > >
> 		> > > Thread 3:
> 		> > > r1 = load_acquire(&x); (1)
> 		> > > r2 = load_acquire(&x); (2)
> 		> > > r3 = load_acquire(&x); (1)
> 		> > >
> 		> is already allowed to flicker under the D2300 rules.
> And looking back
> 		> at Sarita's example, weakening this doesn't seem to
> help.  (The example
> 		> that we should really have been discussing would have
> had release stores.
> 		> That's the one that's currently constrained by the
> modification order
> 		> rule.  And having that flicker does seem dubious.)
> 		
> 		I find this very troubling. From T3's point of view, it
> is just doing acquire 
> 		operations, and it is not expecting any flickering,
> regardless of which stores 
> 		are going to satisfy its loads.  
> 		 
> 
> 	I was almost going to agree with you, and try to change this.
> But this again runs into synchronization elimination issues, which seem
> central here.  If the "acquire"s in thread 3 mean anything without a
> matching "release" then, by similar reasons,
> 	 
> 	r1 = load_relaxed(&x); r2 = load_relaxed(&x); r3 =
> load_relaxed(&x);
> 	 
> 	can allow different outcomes from
> 	 
> 	r1 = load_relaxed(&x); fetch_and_add_acq_rel(&dead1, 0); r2 =
> load_relaxed(&x); fetch_and_add_acq_rel(&dead2, 0); r3 =
> load_relaxed(&x);
> 	 
> 	which means that the dead fetch_and_adds can't be eliminated,
> which is very unfortunate.  It also means that I can't ever eliminate
> locks after thread inlining without understanding the whole program.
> 	 
> 	I'm more and more inclined to do what Sarita was advocating
> anyway, which is to switch to a more conventional formulation of the
> memory model in which happens-before is
> 	just the transitive closure of the union of sequenced-before and
> synchronizes-with.  That makes it clearer that acquire and release only
> provide any guarantees if they occur in pairs.
> 	 
> 	(The last proposal has another similar synchronization
> elimination issue with the "precedes" relation, which includes
> happens-before, but not sequenced-before.  I think we can also get rid
> of by moving back to a more conventional, Java-like, happens-before
> model.)
> 	 
> 	My general feeling is that if we have a trade-off between
> synchronization elimination and more expressive low-level atomics,
> synchronization elimination should win, since it effects lock-based user
> code, which is bound to make up a much larger body of code than
> low-level atomics clients.
> 	 
> 	And although I also find this a bit troubling, I'm still having
> a lot of trouble constructing a case in which this matters.
> 	 
> 	Hans
> 

> -- 
> cpp-threads mailing list
> cpp-threads at decadentplace.org.uk
> http://www.decadentplace.org.uk/cgi-bin/mailman/listinfo/cpp-threads