[cpp-threads] Review comments on N2176
Paul E. McKenney
paulmck at linux.vnet.ibm.com
Tue Apr 3 03:46:25 BST 2007
Hello!
First, I want to thank Hans for his careful exposition of his position
on several memory-model topics.
Here are some responses, section by section. I will provide feedback
on dependency ordering in a separate communication.
Thanx, Paul
------------------------------------------------------------------------
Why Undefined Semantics for C++ Data Races?
o Undefined semantics for C++ data races is not the status quo
for many existing compilers, as much code can and does work
on numerous architectures in face of data races. An example
of such code would be the split counters described in N2153.
"Undefined" seems a bit strong for the status quo.
o While I agree that providing explicit atomic variables is a
good and valuable thing, data races are not necessarily bugs.
Backwards-compatibility considerations will require a number
of compilers to support data races in any case.
o Error diagnostics from races would be good, but it is not
necessary to leave data-race semantics undefined to achieve
this. In many C/C++ compilers it is possible to elicit
warnings for code that is well defined, for example, in
cases where the particular code sequence has been found
to have high probability of indicating a bug.
o Annotating races via atomic variables is indeed a good thing
from the viewpoint of reducing false positives, but it is
not necessary to leave data races undefined to obtain this
benefit. Programmers who choose not to use atomic variables
simply don't gain the benefit of reductions in false positives,
while programmers who do choose to use atomics do enjoy the
possible reduction in false positives.
o I agree that some data races must indeed result in undefined
behavior, and I agree that proliferating fences is undesireable.
However, this does not mean that all data races must remain
undefined.
o I agree that general multithreaded access can result in complex
interactions that we do not want to fully describe in a standard.
o The following example is easily resolved by requiring that
variables be spilled into the stack frame.
unsigned i = x;
if (i < 2) {
foo: ...
switch (i) {
case 0:
...;
break;
case 1:
...;
break;
default:
...;
}
}
Had the programmer wanted to permit refetches of variable x,
s/he could have instead written:
if (x < 2) {
foo: ...
switch (x) {
case 0:
...;
break;
case 1:
...;
break;
default:
...;
}
}
The variables i and x are not the same, so it seems erroneous
for the compiler to treat them as if they were in fact the
same.
o Perhaps it would be better to say that although data races can
result in undefined behavior, existing compilers should offer
backwards compatibility. The safe data races I am aware of
include:
1. Processees storing to a given aligned variable of
basic type that fits into the implementation's machine
word size, while other processes concurrently load
from such a variable. It is permissible for the set of
processes loading and the set of processes storing to
intersect, or even to coincide. Each load must return
either the initial value or the value supplied by one
of the stores.
In some cases, this will require registers to be
spilled to a temporary.
2. More complex interactions involving use of non-standard
extensions. In all of these cases, there is a
corresponding extension being proposed, so it makes
sense to leave these cases implementation-defined.
Why atomics have integrated ordering constraints
o It would be more accurate to say that the atomic-operations
implementation in the Linux kernel -permits- use of explicit
memory fences.
In particular, it is important to note that the Linux-kernel
Documentation/atomic_ops.txt file is written not for the
-user- of the Linux kernel's atomic operations, but rather
for developers -implementing- the Linux kernel's atomic
operations on a new hardware platform. For example, when
this document says that the atomic_inc_return() and
atomic_dec_return() primitives "require[d] that explicit
memory barriers are performed before and after the atomic
oparation calls", it does not mean that the -user- of
these primitives must insert explicit memory barriers,
but rather that the -implementors- must include such
barriers inside these primitives' implementations. In other
words, in the following code fragment:
x = 1;
y = atomic_inc_return(&a);
z = 1;
the implementor of atomic_inc_return() must include whatever
memory barriers might be required by the hardware platform in
question to make it appear to other CPUs that the assignment
to x occurred first, followed by the atomic increment, followed
by the assignment to z.
To reiterate, the -user- of atomic_inc_return() need -not-
supply explicit memory barrier primitives.
In contrast, the user of atomic_inc() and atomic_dec() might
or might not need to supply explicit memory barriers, depending
on that user's algorithm.
So the Linux kernel's atomic-operation implementation is
not necessarily inconsistent with an atomic_ops package.
o That aside, I agree that providing variables that imply
ordering can be useful, at least when load_relaxed() and
store_relaxed() are present. My only argument (as in N2153)
is that explicit fences be provided as well.
o Note that explicit fences can be optimized away in the
single-threaded case, at least outside of device-driver
and architecture-dependent code. The Linux kernel in
fact provides the smp_mb(), smp_rmb(), and smp_wmb()
primitives for this exact purpose. In this case, the
developer has signalled that the barriers are needed only
in multithreaded (or SMP in Linux-kernel parlance) builds
of the kernel.
o I disagree strongly with the implication that naive
programmers do not expect dependency-based ordering.
More on this in a separate communication.
o So you are arguing for sequential consistency, for which there
are no known real-world use cases, while arguing against
dependency-based ordering, for which there are a very large
number. This seems to me to be an extremely strange juxtaposition
of arguments.
I understand that Java has specified SC. However:
o Some of the people involved in that discussion have
publicly argued that it was ill-advised.
o A difference between the models seems harmless if
there are no real-world programs that care.
Should atomic operations be ordered based on dependencies?
o This section needs a more-detailed response, which I will
provide in a separate communication.
In the meantime, I very much thank the author for carefully
documenting his reasoning. This was extremely helpful!
Why ordering constraints are never limited to loads or stores
o The analysis in the "Recipient writes to object" section
implicitly assumes that writes may be performed speculatively.
This is not the case in POWER, and I was under that impression
that it is also not the case in Itanium. (And I believe that
all architectures that permit speculative writes have stronger
memory models that would prohibit the load-store reordering,
thus avoiding the posited problem.)
If I am mistaken, please let me know so that I can move the
rcu_assign_pointer() and/or the rcu_dereference() primitive from
core kernel code to architecture-dependent code. Doing this
would allow Itanium (and other architectures) to use stronger
memory ordering primitives.
o Another important case omitted from this analysis is when
the recipient performs an atomic operation on x.a. In this
case a store-store barrier on the part of the sender suffices.
Treatment of out-of-thin-air values
o You argue that infeasibility of dependency ordering motivates
permitting out-of-thin-air results. Suppose that there was
a reasonable way of permitting dependency ordering. Would
that affect your view on out-of-thin-air results?
More information about the cpp-threads
mailing list