[cpp-threads] Slightly revised memory model proposal (D2300)

Sun Jul 8 21:35:04 BST 2007

On Fri, Jul 06, 2007 at 06:30:11PM -0000, Boehm, Hans wrote:
> This qualifies as a frequently discussed topic.  I had this discussion
> in another group not long ago ... 

OK.  How about the following?

Q.	Once atomics are part of the standard, compiler writers will be
	free to perform whatever optimizations that they like on any
	sequence of evaluations not including access to atomics, right?

A.	Not quite.  Because non-atomic accesses interact with atomic
	accesses, there are a number of situations where care is required.
	One such situation appears in functions which use fine-grained
	locking to manipulate elements of a data structure, but which
	cache information about the last element that they manipulated.
	This technique can greatly improve efficiency in cases where
	a given element is likely to be manipulated multiple times
	in sequence.  Consider foo_func() below:

		foo_func(foo *p)
		{
		   __thread foo *cached = 0;
		   __thread bar precomputed_info;
		   bar my_info;

		   acquire lock for *p;
		   manipulate *p ;
		   if (cached == p) {
		     manipulate *cached some more using precomputed_info;
		   } else {
		     cached = p;
		     manipulate *p some more setting precomputed_info;
		   }
		   release lock for *p;
		}

	In this case, profiling would show that the then-clause of the
	if-statement would be executed a very large fraction of the time.
	An aggressive compiler writer might therefore be tempted to
	execute this then-clause speculatively, however, doing so would
	introduce data races due to the fact that some other thread
	might well be manipulating *cached (and holding cached->lock).

	Note that the above case does not introduce any additional
	constraints on optimization.  After all, some intervening
	function might well have deleted *cached, so that any access
	to it could potentially result in a page fault even in absence
	of multithreading.  This could happen even in the absence
	of threading.  Therefore, this sort of speculative-execution
	optimization requires extreme care, even in single-threaded code.

						Thanx, Paul

> > -----Original Message-----
> > From:  Paul E. McKenney
> > > [Sarita: ]
> > > Paul,
> > > 
> > > The variables in the example below are NOT atomic.
> > > 
> > > > > > > > T1: if (A==1) B=1;
> > > > > > > > T2: if (B==1) A=1
> > > 
> > > Yes, we do want programmers to use atomics so their program will be 
> > > easy to reason about.
> > > 
> > > But how do we explain when to use atomics so the program 
> > will be easy 
> > > to reason about? Minimally, one would expect that atomics 
> > would only 
> > > be needed in situations where two threads access the same 
> > variable. Agreed?
> > >
> > > For the example above, the only way in which two threads can access 
> > > the same variable is if the threads perform speculative writes.
> > 
> > In theory, perhaps.  But in practice, is this even decidable 
> > in general?
> > For example:
> > 
> > 	T1: foo(); B=1;
> > 	T2: bar(); B=2;
> > 
> > To determine whether there is a data race, one must solve the 
> > halting problem on foo() and bar().  If either can be 
> > guaranteed not to return, then there is no data race.  Only 
> > if both return is there a data race.
> > I suspect that one could leverage any number of other 
> > undecidability results as well.
>
> Definitely.  But the primary purpose of the standard is to define
> dynamic behavior of programs.  And basically all aspects of dynamic
> behavior are undecidable.  It's also clearly undecidable whether a
> program will encounter a subscript error.  Yet we give that undefined
> behavior.
>
> > Therefore, I recommend strongly against taking this approach.
> > 
> > > So if you say the above program must use atomics, I think it is 
> > > equivalent to saying that we have to teach undergrads about 
> > > speculative writes when we teach them parallel programming. 
> > Would you 
> > > agree that's unacceptable?
> > 
> > I agree that this approach would be quite ill-advise for most 
> > undergraduates that I have met.
> > 
> > > And in that case, how would we explain the need for atomic 
> > variables 
> > > in the above situation, but not in many others?
> > 
> > Any attempt to create a set of guidelines that accurately 
> > describes whether or not atomics are required for all 
> > possible code fragments is guaranteed to end in tears.  The 
> > set of guidelines will become arbitrarily complex, given that 
> > the process of creating such guidelines reduces to the 
> > halting problem.
> > 
> > I therefore recommend starting with a list of situations in 
> > which non-atomics can safely be used.  This list can be 
> > incomplete, because unnecessary use of atomics will not be a 
> > problem for most undergraduates.
> > Undergraduates who are unsatisfied with this situation can be 
> > referred to graduate courses or to the relevant literature -- 
> > professors were certainly not shy about taking this approach 
> > with me back when I was an undergraduate, after all.  I would 
> > suggest also referring them to relevant concurrent 
> > open-source projects, but not all professors will be 
> > comfortable with open source.
> > 
> > The list we have covered on this email list thus far seems to include:
> > 
> > 1.	Variables used only under the protection of a given 
> > exclusive lock.
> > 
> > 2.	Variables modified only under the protection of the write side
> > 	of a given reader-writer lock, and referenced only under the
> > 	protection of that same reader-writer lock (either read or
> > 	write side).
> > 
> > 3.	Variables modified only preceding a release operation on a
> > 	given atomic variable, and accessed only following an acquire
> > 	operation on that same variable.
> > 
> > 4.	Any of the above combined with single-threaded use, where
> > 	"single-threaded use" is defined to be use at a time when
> > 	only a single thread exists, for example, during initialization.
> > 
> > 5.	Any of the above combined with accesses and modifications
> > 	to newly allocated data structures, before a pointer to the
> > 	given structure has been made accessible to other threads.
> > 
> > We might refine this list, possibly expanding its scope 
> > somewhat, though I suspect that it needs to be kept to under 
> > 5-7 points for general undergraduate consumption.  But you 
> > would know better than would I.
> I think this is unnecessary.  The rule is simply that if a variable can
> be simultaneously accessed by more than one thread (using the SC program
> interpretation), and at least one access is a write, then either
> 
> - (most likely for beginners) your code has a bug, or
> - it needs to be atomic
> 
> I think the problem with Sarita's example is that it has been pruned to
> the point that it is hard to extract its significance.  The real problem
> here is that the if you claim that examples like this do not contain
> races, you effectively have to tell the user (somehow, it won't be easy)
> that you have to hold locks for data that might be accessed by
> functions, even if the function were to take the wrong control path.
> (Presumably the only motivation would be to allow speculation that
> corresponds to executing code from the wrong control path.)  This just
> doesn't work in real code.
> 
> Here's an example from the earlier discussion:
> 
> Assume I have a main function that is passed a pointer p and wants to
> manipulate *p, after acquiring the appropriate lock.  It needs to call a
> helper function for some of this.  The helper function is often called
> on the same p, and optimizes this case by remembering some information
> about the last p it was called on:
> 
> main_func(foo *p)
> {
>    acquire lock for *p;
>    manipulate *p ;
>    helper(p);
>    release lock for *p;
> }
> 
> helper(foo *p)
> {
>    __thread foo *cached = 0;
>    __thread bar precomputed_info;
>    bar my_info;
>  
> 
>    if (cached == p) {
>      manipulate *cached some more using precomputed_info;
>    } else {
>      cached = p;
>      manipulate *p some more setting precomputed_info;
>    }
> }
> 
> This would be incorrect since, if helper took the wrong control path, it
> could access *cached, for which it doesn't have the lock.
> (On the control path on which it actually does access it, it of course
> does have the lock.)  And indeed this corresponds to a speculative
> update to *cached, which an unscrupulous compiler might find profitable.
> 
> Note that I cannot fix this by moving the lock acquisition for *p into
> helper; that would make the two manipulations of *p in main_func and
> helper nonatomic.  And it would greatly increase locking overhead.
> 
> I don't think that a programming model in which this can fail is
> defensible.  But this is basically equivalent to Sarita's example.
> 
> Hans
> 
>  
>