[cpp-threads] Yet another visibility question

Wed Dec 20 00:14:19 GMT 2006

Another way out here might be to have dependence contribute to the N2052
"precedes" relation, but not happens-before.  Thus we might be able to
preclude "out of thin air" results by having "causal cycles" result in a
cyclic "precedes", without actually guaranteeing dependence-based
ordering.  Thus we might be able to address (2a) below, without
addressing (2b), and without imposing optimization restrictions.

Since I'm increasingly convinced that (2b) is essentially incompatible
with preserving optimization in unrelated code, this is starting to
sound attractive ...

Hans

> -----Original Message-----
> From: cpp-threads-bounces at decadentplace.org.uk 
> [mailto:cpp-threads-bounces at decadentplace.org.uk] On Behalf 
> Of Boehm, Hans
> Sent: Tuesday, December 19, 2006 3:51 PM
> To: C++ threads standardisation
> Subject: RE: [cpp-threads] Yet another visibility question
> 
> Backing up a minute, and trying to reconstruct how we got 
> here in the first place:
> 
> 1) We want unordered atomic operations, since they are 
> cheaper in some contexts, and are occasionally sufficient.  
> They are unfortunately, tricky to use correctly and, as we 
> keep discovering, even trickier to define correctly.
> 
> 2) Given that we have unordered atomics, there are two 
> distinct reasons for wanting to impose some ordering on them 
> anyway, in case of two clearly dependent operations:
> 
> (a) If we don't do so, we don't have an tractable way to 
> prohibit "out of thin air" results.  The canonical examples 
> is (everything except r? "registers" unordered atomic, 
> initially zero):
> 
> Thread 1: r1 = x; y = r1;
> Thread 2: r1 = y; x = r2;
> Questionable outcome: r1 = r2 = 42.
> 
> With simple happens-before consistency, this is allowed if 
> there is no ordering between the two assignments in each 
> thread, since each load can see the result of the store in 
> the other thread.  So long as we pick identical values for r1 
> and r2 we can construct an "execution" that justifies it.  
> Allowing this is somewhat nasty , since it makes it harder to 
> reason about programs.  Can I potentially generate a pointer 
> to a local variable whose address wasn't taken "out of thin 
> air" in this way?  (This probably doesn't really affect the 
> compiler, since real implementations will not allow this.  
> But it causes a problem for higher level tools, I suspect.  
> For Java, the problem is much worse for security reasons.)
> 
> (b) Some ordering properties along these lines in fact seem 
> to be provided by all hardware.  It would be nice to take 
> advantage of those.
> Sometimes such dependency-based ordering can be used to avoid 
> the overhead of explicit ordering.  One such example is 
> Peter's attempt to minimize ordering constraints in reference 
> counting.  This was reduced to
> 
> >>>> // x y initially zero
> >>>>
> >>>> // threads 1 to N:
> >>>>
> >>>> assert( x == 0 );
> >>>>
> >>>> if( fetchadd_release( &y, 1 ) == N - 1 ) {
> >>>>    x = 1;
> >>>> }
> where we need the fetchadd_release to become visible before "x = 1;"
> 
> 3) Some of the (2b) examples, like Peter's here, require 
> dependency based ordering for at least an atomic unordered 
> load (such as the one that's essentially part of 
> fetchadd_release) and an ordinary store.
> 
> My earlier proposal to deal with this was
> http://www.decadentplace.org.uk/pipermail/cpp-threads/2006-Dec
> ember/0012
> 11.html .  Ben pointed out a relatively uncontroversial correction in
> http://www.decadentplace.org.uk/pipermail/cpp-threads/2006-Dec
> ember/0012
> 12.html and I think 1.10p7 can be further simplified by 
> relying on transitive ordering.
> 
> The most relevant changes were
> 
> 1) for 1.10p5 to order an unordered atomic load followed by 
> even an ordinary store, and
> 
> 2) for 1.10p7 to have all atomic reader-writer 
> correspondences, even those involving unordered operations, 
> induce synchronizes with ordering.
> Acquire-release operations would now differ only in the 
> inter-thread-ordered-before relationships.
> 
> I now think the proposed 1.10p5 change doesn't work, and I 
> have serious doubts about the original.  The fundamental 
> problem is that all of this requires that the compiler has to 
> preserve dependencies, and that may invalidate optimizations 
> in compilation units that don't mention atomics.
> 
> The proposed condition for dependence-based 
> inter-thread-ordered-before was
> 
> "A is an unordered atomic read and B writes a value which 
> depends on the value read by A, or the execution of B is 
> conditioned on the value read by A."
> 
> I was thinking about trying to resolve this with a 
> restatement (which would require translation to proper 
> standardese, if it had gotten that
> far):
> 
> "A is an unordered atomic read and B, an evaluation that 
> occurs as part of evaluation of an expression E in A's 
> function, writes a value which depends on the value read by 
> B.  "Depends on" here should be taken to mean that  if A were 
> to read some other value consistent with its type, then the 
> evaluation of E either
> 
> 	- Does not take place, or
> 	- Does not result in the same value being written to 
> the same object"
> 
> This would mean that if I write
> 
> r1 = y_initialized.load_raw();
> if (r1) {
>   y.x = 1;
> } else {
>   y.x = 1;
> }
> 
> then each assignment to y.x is inter-thread-ordered-after the 
> load_raw, since each of the y.x = 1 assignments may not take 
> place depending on the value of r1.
> 
> However, if I instead write
> 
> r1 = y_initialized.load_raw();
> foo(r1);
> 
> and someplace else
> 
> void foo(bool r1)
> {
>   if (r1) {
>     y.x = 1;
>   } else {
>     y.x = 1;
>   }
> }
> 
> This does seem to be (barely) implementable, since the 
> load_raw optimization impact is contained to the function and 
> hence compilation unit in which it is contained.  But it has 
> several remaining problems.
> I think the showstopper is that even the first example breaks 
> if I make the load_raw call in a separate function.  Hence I 
> would expect it to only work in practice for badly structured code.
> 
> Better ideas?
> 
> Obvious alternatives include:
> 
> - A more Java-like causality treatment.  That is still 
> considerably more complicated to state (See
> http://java.sun.com/docs/books/jls/third_edition/html/memory.h
> tml#17.4.8
> )
> and I'm not sure it's quite as precise as we would like with 
> respect to correspondence of actions to source expressions 
> either, and that seems to be one of the sticking points 
> here.)  There are probably other contenders here, like Vijay 
> Saraswat's recent work (http://www.saraswat.org/rao.html), 
> but I'm not convinced that any of these could easily go into 
> a language standard.
> 
> - Dropping unordered atomics.  I'm unconvinced that's viable 
> on architectures like PowerPC and ARM, where acquire loads 
> and release stores require a fence.  It would result in a lot 
> of platform-specific low-level code for those platforms.
> 
> - Allowing "out of thin air" results for unordered atomics, 
> guaranteeing no dependency-based ordering, and putting up 
> with more expensive reference counting, etc.
> 
> > -----Original Message-----
> > From:  Peter Dimov
> > Sent: Tuesday, December 19, 2006 10:11 AM
> > To: C++ threads standardisation
> > Subject: Re: [cpp-threads] Yet another visibility question
> > 
> > Hans Boehm wrote:
> > 
> > > Otherwise consider a variant of the above example:
> > >
> > > r1 = y_initialized.load_raw();
> > > if (r1) {
> > >   y.foo();
> > > } else {
> > >   y.bar();
> > > }
> > >
> > > Assume that y_initialized is set in another thread after
> > initializing
> > > y. I'm now in a situation in which this code is correct
> > unless foo()
> > > and bar() happen to assign the same value to the same field
> > of y.  In
> > > that case, the stores would no longer be dependent, and 
> hence could 
> > > happen before initialization.  But clearly it shouldn't be
> > my business
> > > to know how
> > > foo() and bar() are implemented.
> > 
> > If we assume that foo and bar are opaque, the code is 
> incorrect anyway 
> > due to a possible data race between the potential reads 
> from y in foo 
> > and bar and the initialization of y in the other thread.
> > 
> 
> But with the proposed changes, this would no longer be a 
> race.  We would have
> 
> initialization of y inter-thread-ordered-before release store 
> of y_initialized synchronizes-with
> y_initialized.load_raw() inter-thread-ordered-before
> foo() or bar() calls
> 
> Hans
> 
> --
> cpp-threads mailing list
> cpp-threads at decadentplace.org.uk
> http://www.decadentplace.org.uk/cgi-bin/mailman/listinfo/cpp-threads
>