[cpp-threads] Slightly revised memory model proposal (D2300)

Fri Jul 6 19:30:11 BST 2007

This qualifies as a frequently discussed topic.  I had this discussion
in another group not long ago ... 

> -----Original Message-----
> From:  Paul E. McKenney
> > [Sarita: ]
> > Paul,
> > 
> > The variables in the example below are NOT atomic.
> > 
> > > > > > > T1: if (A==1) B=1;
> > > > > > > T2: if (B==1) A=1
> > 
> > Yes, we do want programmers to use atomics so their program will be 
> > easy to reason about.
> > 
> > But how do we explain when to use atomics so the program 
> will be easy 
> > to reason about? Minimally, one would expect that atomics 
> would only 
> > be needed in situations where two threads access the same 
> variable. Agreed?
> >
> > For the example above, the only way in which two threads can access 
> > the same variable is if the threads perform speculative writes.
> 
> In theory, perhaps.  But in practice, is this even decidable 
> in general?
> For example:
> 
> 	T1: foo(); B=1;
> 	T2: bar(); B=2;
> 
> To determine whether there is a data race, one must solve the 
> halting problem on foo() and bar().  If either can be 
> guaranteed not to return, then there is no data race.  Only 
> if both return is there a data race.
> I suspect that one could leverage any number of other 
> undecidability results as well.
Definitely.  But the primary purpose of the standard is to define
dynamic behavior of programs.  And basically all aspects of dynamic
behavior are undecidable.  It's also clearly undecidable whether a
program will encounter a subscript error.  Yet we give that undefined
behavior.

> 
> Therefore, I recommend strongly against taking this approach.
> 
> > So if you say the above program must use atomics, I think it is 
> > equivalent to saying that we have to teach undergrads about 
> > speculative writes when we teach them parallel programming. 
> Would you 
> > agree that's unacceptable?
> 
> I agree that this approach would be quite ill-advise for most 
> undergraduates that I have met.
> 
> > And in that case, how would we explain the need for atomic 
> variables 
> > in the above situation, but not in many others?
> 
> Any attempt to create a set of guidelines that accurately 
> describes whether or not atomics are required for all 
> possible code fragments is guaranteed to end in tears.  The 
> set of guidelines will become arbitrarily complex, given that 
> the process of creating such guidelines reduces to the 
> halting problem.
> 
> I therefore recommend starting with a list of situations in 
> which non-atomics can safely be used.  This list can be 
> incomplete, because unnecessary use of atomics will not be a 
> problem for most undergraduates.
> Undergraduates who are unsatisfied with this situation can be 
> referred to graduate courses or to the relevant literature -- 
> professors were certainly not shy about taking this approach 
> with me back when I was an undergraduate, after all.  I would 
> suggest also referring them to relevant concurrent 
> open-source projects, but not all professors will be 
> comfortable with open source.
> 
> The list we have covered on this email list thus far seems to include:
> 
> 1.	Variables used only under the protection of a given 
> exclusive lock.
> 
> 2.	Variables modified only under the protection of the write side
> 	of a given reader-writer lock, and referenced only under the
> 	protection of that same reader-writer lock (either read or
> 	write side).
> 
> 3.	Variables modified only preceding a release operation on a
> 	given atomic variable, and accessed only following an acquire
> 	operation on that same variable.
> 
> 4.	Any of the above combined with single-threaded use, where
> 	"single-threaded use" is defined to be use at a time when
> 	only a single thread exists, for example, during initialization.
> 
> 5.	Any of the above combined with accesses and modifications
> 	to newly allocated data structures, before a pointer to the
> 	given structure has been made accessible to other threads.
> 
> We might refine this list, possibly expanding its scope 
> somewhat, though I suspect that it needs to be kept to under 
> 5-7 points for general undergraduate consumption.  But you 
> would know better than would I.
I think this is unnecessary.  The rule is simply that if a variable can
be simultaneously accessed by more than one thread (using the SC program
interpretation), and at least one access is a write, then either

- (most likely for beginners) your code has a bug, or
- it needs to be atomic

I think the problem with Sarita's example is that it has been pruned to
the point that it is hard to extract its significance.  The real problem
here is that the if you claim that examples like this do not contain
races, you effectively have to tell the user (somehow, it won't be easy)
that you have to hold locks for data that might be accessed by
functions, even if the function were to take the wrong control path.
(Presumably the only motivation would be to allow speculation that
corresponds to executing code from the wrong control path.)  This just
doesn't work in real code.

Here's an example from the earlier discussion:

Assume I have a main function that is passed a pointer p and wants to
manipulate *p, after acquiring the appropriate lock.  It needs to call a
helper function for some of this.  The helper function is often called
on the same p, and optimizes this case by remembering some information
about the last p it was called on:

main_func(foo *p)
{
   acquire lock for *p;
   manipulate *p ;
   helper(p);
   release lock for *p;
}

helper(foo *p)
{
   __thread foo *cached = 0;
   __thread bar precomputed_info;
   bar my_info;

   if (cached == p) {
     manipulate *cached some more using precomputed_info;
   } else {
     cached = p;
     manipulate *p some more setting precomputed_info;
   }
}

This would be incorrect since, if helper took the wrong control path, it
could access *cached, for which it doesn't have the lock.
(On the control path on which it actually does access it, it of course
does have the lock.)  And indeed this corresponds to a speculative
update to *cached, which an unscrupulous compiler might find profitable.

Note that I cannot fix this by moving the lock acquisition for *p into
helper; that would make the two manipulations of *p in main_func and
helper nonatomic.  And it would greatly increase locking overhead.

I don't think that a programming model in which this can fail is
defensible.  But this is basically equivalent to Sarita's example.

Hans