[cpp-threads] infinite loops

Fri Jul 28 20:16:03 BST 2006

David Callahan (now at Microsoft) brought up an interesting issue at a
couple of meetings that it would be nice to discuss here, and perhaps
pin down at the late August meeting in Redmond.

Currently there are many compiler analyses that assume that if we have

A;
for (...) { ... } // no branches out of loop
B;

then B is executed if and only if A is.  (I believe that standard PRE
implementations assume this, for example.  The issue was discussed on
the JMM list, and is briefly discussed in the JMM POPL paper, as I
recall.)

This allows

for (;;); x = 42;

to be transformed to

x = 42; for (;;);

which is sequentially correct, but can introduce data races.

The current memory model proposal disallows this.

The Java memory model also disallows it, but I wouldn't bet large sums
of money that current implementations get this right.

David, correctly, I think, points out that prohibiting this is
unpleasant, since:

- It breaks many standard compiler analyses.  Some of them result in
optimizations (e.g. loop fusion) that probably matter much more here
than they currently do for Java.

- It's not clear there are any real programs which might be broken by
the transformation and which would otherwise be standard conforming.

As a result, it's also probably nearly untestable whether or not
implementations get this right.

Note that this issue really has nothing to do with optimizing infinite
loops, which are rare.  It has to do with optimizing loops for which the
compiler cannot prove termination.  I expect these are in the majority
for code that I write, since they tend to include all loops traversing
linked data structures.

The problem with the above transformation, if the loop actually happens
to be infinite, is that it breaks the fundamental guarantee that
data-race-free programs have sequentially consistent semantics.

A way out of this dilemma appears to be to state that 

("infinite loop restriction") Programs that contain infinite loops
without "acquire+" operations have undefined behavior.

Here any acquire operation is also an "acquire+" operation.  The set of
"acquire+" operations probably also needs to include I/O and anything
that might block, wait, or put the processor in a low power state.  (The
compiler would presumably treat those as opaque, and thus potentially
blocking forever, anyway.)  The exact definition is TBD, as is the set
of library calls with acquire semantics.

My inclination currently is to include the "infinite loop restriction".
It adds a very small amount of complexity in return for potentially
substantial optimization benefits.  I think among all the code I've
written, it breaks one piece of debugging code in our garbage collector.
And all such code was never really C++ standard conforming, since it's
multithreaded.  And it is easily fixable.  And it should probably
include a call to put the processor in a lower power state anyway, if we
had such a call.

Note that any sort of potentially infinite wait loop really has to
include an atomic load<acquire> anyway, and is thus not an issue.

Opinions?  Objections?

Hans