[cpp-threads] Slightly revised memory model proposal (D2300)
Paul E. McKenney
paulmck at linux.vnet.ibm.com
Mon Jul 9 16:03:41 BST 2007
And the code fragment needs a bit of help to be safe in face of
freeing and reallocating "foo" structs.
foo_func(foo *p)
{
__thread foo *cached = 0;
__thread bar precomputed_info;
long *cached_gen;
bar my_info;
acquire lock for *p;
manipulate *p ;
if (cached == p && cached_gen == p->gen) {
manipulate *cached some more using precomputed_info;
} else {
cached = p;
cached_gen = p->gen;
manipulate *p some more setting precomputed_info;
}
release lock for *p;
}
Of course, if "foo" structs are never deallocated, then this is not
necessary.
Or did a larger example have some other mechanism to prevent freeing
and later reallocation of the foo structs?
Thanx, Paul
On Sun, Jul 08, 2007 at 01:35:04PM -0700, Paul E. McKenney wrote:
> On Fri, Jul 06, 2007 at 06:30:11PM -0000, Boehm, Hans wrote:
> > This qualifies as a frequently discussed topic. I had this discussion
> > in another group not long ago ...
>
> OK. How about the following?
>
> Q. Once atomics are part of the standard, compiler writers will be
> free to perform whatever optimizations that they like on any
> sequence of evaluations not including access to atomics, right?
>
> A. Not quite. Because non-atomic accesses interact with atomic
> accesses, there are a number of situations where care is required.
> One such situation appears in functions which use fine-grained
> locking to manipulate elements of a data structure, but which
> cache information about the last element that they manipulated.
> This technique can greatly improve efficiency in cases where
> a given element is likely to be manipulated multiple times
> in sequence. Consider foo_func() below:
>
> foo_func(foo *p)
> {
> __thread foo *cached = 0;
> __thread bar precomputed_info;
> bar my_info;
>
> acquire lock for *p;
> manipulate *p ;
> if (cached == p) {
> manipulate *cached some more using precomputed_info;
> } else {
> cached = p;
> manipulate *p some more setting precomputed_info;
> }
> release lock for *p;
> }
>
> In this case, profiling would show that the then-clause of the
> if-statement would be executed a very large fraction of the time.
> An aggressive compiler writer might therefore be tempted to
> execute this then-clause speculatively, however, doing so would
> introduce data races due to the fact that some other thread
> might well be manipulating *cached (and holding cached->lock).
>
> Note that the above case does not introduce any additional
> constraints on optimization. After all, some intervening
> function might well have deleted *cached, so that any access
> to it could potentially result in a page fault even in absence
> of multithreading. This could happen even in the absence
> of threading. Therefore, this sort of speculative-execution
> optimization requires extreme care, even in single-threaded code.
>
> Thanx, Paul
>
> > > -----Original Message-----
> > > From: Paul E. McKenney
> > > > [Sarita: ]
> > > > Paul,
> > > >
> > > > The variables in the example below are NOT atomic.
> > > >
> > > > > > > > > T1: if (A==1) B=1;
> > > > > > > > > T2: if (B==1) A=1
> > > >
> > > > Yes, we do want programmers to use atomics so their program will be
> > > > easy to reason about.
> > > >
> > > > But how do we explain when to use atomics so the program
> > > will be easy
> > > > to reason about? Minimally, one would expect that atomics
> > > would only
> > > > be needed in situations where two threads access the same
> > > variable. Agreed?
> > > >
> > > > For the example above, the only way in which two threads can access
> > > > the same variable is if the threads perform speculative writes.
> > >
> > > In theory, perhaps. But in practice, is this even decidable
> > > in general?
> > > For example:
> > >
> > > T1: foo(); B=1;
> > > T2: bar(); B=2;
> > >
> > > To determine whether there is a data race, one must solve the
> > > halting problem on foo() and bar(). If either can be
> > > guaranteed not to return, then there is no data race. Only
> > > if both return is there a data race.
> > > I suspect that one could leverage any number of other
> > > undecidability results as well.
> >
> > Definitely. But the primary purpose of the standard is to define
> > dynamic behavior of programs. And basically all aspects of dynamic
> > behavior are undecidable. It's also clearly undecidable whether a
> > program will encounter a subscript error. Yet we give that undefined
> > behavior.
> >
> > > Therefore, I recommend strongly against taking this approach.
> > >
> > > > So if you say the above program must use atomics, I think it is
> > > > equivalent to saying that we have to teach undergrads about
> > > > speculative writes when we teach them parallel programming.
> > > Would you
> > > > agree that's unacceptable?
> > >
> > > I agree that this approach would be quite ill-advise for most
> > > undergraduates that I have met.
> > >
> > > > And in that case, how would we explain the need for atomic
> > > variables
> > > > in the above situation, but not in many others?
> > >
> > > Any attempt to create a set of guidelines that accurately
> > > describes whether or not atomics are required for all
> > > possible code fragments is guaranteed to end in tears. The
> > > set of guidelines will become arbitrarily complex, given that
> > > the process of creating such guidelines reduces to the
> > > halting problem.
> > >
> > > I therefore recommend starting with a list of situations in
> > > which non-atomics can safely be used. This list can be
> > > incomplete, because unnecessary use of atomics will not be a
> > > problem for most undergraduates.
> > > Undergraduates who are unsatisfied with this situation can be
> > > referred to graduate courses or to the relevant literature --
> > > professors were certainly not shy about taking this approach
> > > with me back when I was an undergraduate, after all. I would
> > > suggest also referring them to relevant concurrent
> > > open-source projects, but not all professors will be
> > > comfortable with open source.
> > >
> > > The list we have covered on this email list thus far seems to include:
> > >
> > > 1. Variables used only under the protection of a given
> > > exclusive lock.
> > >
> > > 2. Variables modified only under the protection of the write side
> > > of a given reader-writer lock, and referenced only under the
> > > protection of that same reader-writer lock (either read or
> > > write side).
> > >
> > > 3. Variables modified only preceding a release operation on a
> > > given atomic variable, and accessed only following an acquire
> > > operation on that same variable.
> > >
> > > 4. Any of the above combined with single-threaded use, where
> > > "single-threaded use" is defined to be use at a time when
> > > only a single thread exists, for example, during initialization.
> > >
> > > 5. Any of the above combined with accesses and modifications
> > > to newly allocated data structures, before a pointer to the
> > > given structure has been made accessible to other threads.
> > >
> > > We might refine this list, possibly expanding its scope
> > > somewhat, though I suspect that it needs to be kept to under
> > > 5-7 points for general undergraduate consumption. But you
> > > would know better than would I.
> > I think this is unnecessary. The rule is simply that if a variable can
> > be simultaneously accessed by more than one thread (using the SC program
> > interpretation), and at least one access is a write, then either
> >
> > - (most likely for beginners) your code has a bug, or
> > - it needs to be atomic
> >
> > I think the problem with Sarita's example is that it has been pruned to
> > the point that it is hard to extract its significance. The real problem
> > here is that the if you claim that examples like this do not contain
> > races, you effectively have to tell the user (somehow, it won't be easy)
> > that you have to hold locks for data that might be accessed by
> > functions, even if the function were to take the wrong control path.
> > (Presumably the only motivation would be to allow speculation that
> > corresponds to executing code from the wrong control path.) This just
> > doesn't work in real code.
> >
> > Here's an example from the earlier discussion:
> >
> > Assume I have a main function that is passed a pointer p and wants to
> > manipulate *p, after acquiring the appropriate lock. It needs to call a
> > helper function for some of this. The helper function is often called
> > on the same p, and optimizes this case by remembering some information
> > about the last p it was called on:
> >
> > main_func(foo *p)
> > {
> > acquire lock for *p;
> > manipulate *p ;
> > helper(p);
> > release lock for *p;
> > }
> >
> > helper(foo *p)
> > {
> > __thread foo *cached = 0;
> > __thread bar precomputed_info;
> > bar my_info;
> >
> >
> > if (cached == p) {
> > manipulate *cached some more using precomputed_info;
> > } else {
> > cached = p;
> > manipulate *p some more setting precomputed_info;
> > }
> > }
> >
> > This would be incorrect since, if helper took the wrong control path, it
> > could access *cached, for which it doesn't have the lock.
> > (On the control path on which it actually does access it, it of course
> > does have the lock.) And indeed this corresponds to a speculative
> > update to *cached, which an unscrupulous compiler might find profitable.
> >
> > Note that I cannot fix this by moving the lock acquisition for *p into
> > helper; that would make the two manipulations of *p in main_func and
> > helper nonatomic. And it would greatly increase locking overhead.
> >
> > I don't think that a programming model in which this can fail is
> > defensible. But this is basically equivalent to Sarita's example.
> >
> > Hans
> >
> >
> >
>
> --
> cpp-threads mailing list
> cpp-threads at decadentplace.org.uk
> http://www.decadentplace.org.uk/cgi-bin/mailman/listinfo/cpp-threads
More information about the cpp-threads
mailing list