[cpp-threads] Web site updated

Tue Feb 13 19:01:12 GMT 2007

On Tue, Feb 13, 2007 at 05:47:06PM +0200, Peter Dimov wrote:
> Paul E. McKenney wrote:
> >On Tue, Feb 13, 2007 at 05:47:14AM +0200, Peter Dimov wrote:
> >>Paul E. McKenney wrote:
> >>
> >>>From the load_raw() side, my hope would be that the compiler would
> >>>be required to "forget" where the value came from, thus being
> >>>unable to recognize that the following:
> >>>
> >>>x = a.load_raw();
> >>>y = a.load_raw();
> >>>
> >>>might be cached -- in other words, there must be a separate load
> >>>from "a" for both x and y.  Or am I missing your point?
> >>
> >>The compiler is allowed to optimize out the second load since there
> >>is no way of enforcing a particular execution, one where something
> >>other-thread-ly happens between the two statements. The execution
> >>where the current thread is not interrupted between the two loads is
> >>legitimate, so the programmer has no grounds to complain if he gets
> >>a program that does exactly that.
> >
> >How is the compiler to optimize out the second load if it has properly
> >forgotten where it loaded the value in x from?  The load_raw()
> >function is -not- sequential-program "business as usual" for the
> >compiler, after all.
> 
> The compiler is required to emit a program that, when run, produces a 
> legitimate execution (within a certain "common case" boundaries). It is not 
> required to emit a program that, when run 10^94 times, produces all 
> possible legitimate executions (this may not even be possible on the given 
> hardware).

Agreed, but that is not my point.

> An execution where the two load_raw statements return the same value is 
> legitimate. Therefore, the compiler should be allowed to substitute the 
> second load_raw with the value loaded by the first, effectively restricting 
> the possible executions to this particular subset, and the program should 
> remain valid, even though the optimization is detectable by overly pedantic 
> test suites. All optimizations are.
> 
> Similarly, I'd expect in
> 
> a.store_raw( 5 );
> a.store_raw( 6 );
> 
> the first store to be optimized out.

MMIO accesses, anyone?  Hardware timing analysis (especially if there
is a short loop between the two)?

Let's please just outlaw that sort of optimization for store_raw(), so
that it can retain its full meaning.

>                                      Ditto in
> 
> a.store_release( 5 );
> a.store_release( 6 );
> 
> saving me one barrier. No correct program should rely on sneaking a load 
> from another thread between the two stores, since this is not guaranteed to 
> happen.

Again, if MMIO or hardware timing is involved, there really can be correct
programs that care.  In the MMIO case, the hardware might really rely on
the pair of stores, and in the hardware timing case, one might be trying
to detect the a==5 state (though again, in this case one would more likely
have a loop between the two stores).

Let's save the access-combining optimizations for normal accesses to data!

							Thanx, Paul