[cpp-threads] RE: Ping on memory model and concurrency

Mon Aug 15 06:44:32 BST 2005

The transformation can break a correctly synchronized program, though that
is admittedly unlikely.

Assume that count is protected by a lock.  It is incremented by a function
f containing this loop, among other code.  Assume f is documented to touch
count only if the corresponding list is nonempty (admittedly unlikely),
and rerquires that the client acquire the lock if count is updated and
shared between threads.

If I call f with an empty list and a shared count variable, I do not need
to acquire the lock, because f is correctly documented not to update
count.  And indeed I should not have to acquire the lock, because there is
no data race.  But if count is promoted by the compiler to a register in
the standard way, it will be unconditionally loaded before the loop, and
stored back after the loop.  Thus my call with an empty list may cause
another concurrent update to count to be lost.  The compiler has
just introduced a data race that was not present in the source.

There are more complex examples, including the one in the paper, that are
actually more likely to break things in practice.  This one is
particularly annoying because it argues that even some very simple
transformations are really problematic.

Does that clarify the issue?

Hans

On Sun, 14 Aug 2005, Herb Sutter wrote:

>
> > Making this true has a significant impact on existing compilers.  An
> > astonishingly simple example that I used in my PLDI talk, though not
> the
> > paper, is the following loop, which adds the number of elements of a
> list
> > to the global variable 'count':
> >
> > for (p = list; p != 0; p = p -> next) ++count;
> >
> > As far as I can tell, nearly every optimizing compiler naively
> promotes
> > 'count' to a register.  This is almost certainly wrong, in that
> 'count' is
> > written even if the list is empty, and it can thus introduce a data
> race.
> > Although it is very unlikely to break code in practice, it can break
> code
> > that is fully synchronized using locks.  And most programmers would
> have
> > zero chance of debugging the result.
>
> Is that an unexpected data race, or just a classic data race? I would
> assume that the program should be doing an interlocked increment on
> count anyway, or explicit memory fences for count when the thread
> executes the loop. No?
>
> I'm probably misunderstanding the example. Clarification would be
> appreciated.
>
> > Current specifications are ambiguous
> > with respect to things like this; fixing that is my top priority.  So
> long
> > as compilers are allowed to introduce unexpected data races, we have
> no
> > hope of teaching programmers to write correct multithreaded code.
>
> Completely agree.
>
> Herb
>
>
>