[cpp-threads] High-level vs. low-level

Thu Jul 13 18:40:16 BST 2006

Thanks, Peter. Let me share my worries too, and maybe we can converge. I
don't expect everyone to agree with my point of view, but maybe if I
share some of the anxious handwringing I've been doing, I might at least
get some catharsis out of it. :-)

> In this message, I will try to explain in more detail why I think the
"low
> level" syntax is better than the "high level" syntax. By low level
syntax
> I mean
> 
>     atomic_store_rel( &x, 1 );
>     r1 = atomic_load_acq( &x );

By this I believe you mean writing the fence explicitly on each
operation, correct?

I think part of our challenge as experts is to remember that (a) we're
not representative of the expertise of people who will be using this,
and (b) we get stuff like this wrong more often than we like to admit.
The people who want to use something like atomic_store_rel explicitly on
each operation is in the vanishing minority of programmers (perhaps
under 2% will ever want to write such code), and of those in turn I
believe only a fraction will actually use it correctly. (I have grave
doubts whether I'm in the latter fraction; I suspect I'm not. For
comparison, it's not unusual for even a peer-reviewed paper to get
corrections about where the fences go, though as you point out below
most papers don't even try to say where the fences go.)

In general, I believe that programming models that require programmers
to know why and how to write explicit fences have already proven too
difficult for even expert programmers to use reliably. Experts routinely
encounter difficulty reasoning about even full fences, which are the
simplest form to think about, and I see them routinely get even those
wrong (it's very typical to forget them, and it's nearly as typical to
add them when in doubt leading to oversynchronization). Just a few weeks
ago I was having a similar discussion internally at Microsoft, and
someone said, "yeah, you should see today's thread on XYZ" which was
exactly about a group of experts having an extended discussion and
disagreeing about where and how to write a full fence. And, yes, these
were bona fine expert developers, not hacks.

Leaving aside the difficulty of fences, a secondary issue with the above
is that from the point of view of language design this multiplies the
opportunities to go wrong -- specifically because it requires the
programmer to remember to say something special (potentially) each time
they read and write the variable, and each time is an opportunity for
the programmer to be human and forget, with no safety net and silent
compilation. At the very least, even for the rocket scientists who'd
want to do the above, wouldn't you prefer to be able to declare x as a
type that doesn't have regular assignment available at all, and so
_requires_ the programmer to write code like the above (including e.g.
atomic_store_normal when no special fencing is required) so that you get
a compile-time error when you forget? Otherwise it's way too easy to
forget, and I don't believe the code is maintainable (experts might get
the initial coding right, but I don't believe it has a good chance to
stay right in the face of maintenance).

> and by high level syntax I mean
> 
>     x = 1;
>     r1 = x;
> 
> where x is suitably declared as atomic.
> 
> First, consider the programming community, in particular, the part
that
> needs to read code. When encountering
> 
>     x = 1;
> 
> the programmer needs to look up the declaration of x in order to infer
the
> semantics of the assignment. 

Aside: Surely the programmer has to know whether or not x is shared, and
if so what synchronization to use (what lock to take, or that it's
atomic)? If he doesn't know how x is supposed to be protected, he's got
bigger problems.

> In some cases (amortized constant) he'll also
> need to look up the C++ memory model specification. Not a big deal,
> admittedly.

I don't think this is true, and I think there's a misconception here. My
difficulty is that I think the vast majority of programmers won't (and
don't) look up the memory model -- they will (and do) just assume
sequential consistency. (We can also have a separate debate over whether
they should be expected to look at the memory model. :-) )

I don't think this is speculation, because it's been amply observed to
be that people expect and rely on SC in practice. DCL is a canonically
cited (if vilified) example, not only because people expect it to work
and because of how hard it is to explain why it doesn't, but primarily
because how long the papers and threads are about why it does or doesn't
work. Anything in this area that requires an explanation longer than two
lines is too hard for the professional developer. I suspect it's
probably also too hard for most researchers; even experts who understand
they're not dealing with SC consistently fall into the trap of
forgetting that they're assuming it in certain spots in their code or
paper.

> The bigger problem is that after looking up the necessary information
to
> infer the precise semantics of x = 1, our programmer still doesn't
know
> what
> is _the intent_ of the assignment and how it came to be. It could be
that
> this particular statement never races but x is atomic because some of
its
> other manipulations do involve races. It could be that there is a
race,
> but
> no memory synchronization is needed. It could be that the code depends
on
> the release barrier for correctness. Finally, it could be that the
code
> was
> written as-is with x being an ordinary variable, then x was changed to
> atomic<> as a bug fix without any kind of careful analysis of its
> correctness.
>
> All this isn't mere speculation; currently we have the exact same
> situation
> when x is volatile, and the above problems do occur quite regularly.

Do you mean C++ volatile, or Java volatile? I believe Java volatile got
that part exactly right, by requiring volatile reads/writes to be SC. If
you let a store-load be reordered, it's too easy to come up with simple
code that experts who know the memory model have difficulty
understanding.

> Second, consider the lock-free research, in particular its practical
> applicability.
> 
> The current de-facto standard, as far as I've seen, is to describe
> algorithms in the "high level" syntax, assuming a sequential
consistency
> memory model. There is usually a small print note along the lines of
"the
> reader is expected to insert the appropriate memory barriers at the
> appropriate places to make the described algorithm work on real-world
> hardware". Usually, the appropriate barriers and places aren't
described
> anywhere in the paper.

True. Part of the reason is that it's system-specific, but I think
another significant reason is that it's too hard. Authors who do provide
actual code frequently enough discover unnoticed races and publish
corrections.

> One approach to bridge this gap between the research community and the
C++
> programming community is to provide a way to achieve sequential
> consistency
> by using an ordinary syntax, but this obviously leads to suboptimal
> implementations.

Yes, and this is the critical point. "Suboptimal" compared to what, and
for some measure of raw performance or for correctness and
maintainability? In Hans's presentation about the C++MM, he has this
line:

  "Fundamental assumption: Usability is more important than 5%
performance."

Does this represent the consensus of this group? I think this could be
made even stronger. Last month I quoted this in a discussion, and a few
days later one of the designers of Itanium told me, "in practice, I
think saying that is a bit weak, because usability is probably more
important than even 20% performance." Fast hardware that isn't reliably
programmable by humans isn't useful. I do worry that faster code that
isn't reliably writable and maintainable by humans is a similar false
economy.

> The alternative is to provide a standard vocabulary for expressing the
> "appropriate memory barriers at the appropriate places" and hope that
it
> is
> adopted by the research comminity. In my opinion, the low-level syntax
can
> serve as such a vocabulary.
> 
> Thanks for reading. :-)

Thanks for listening to my worrying and handwringing. :-) I really
appreciate your taking time to articulate where you're coming from, and
I find a lot to agree with, particularly in bridging architectures for
portability and bridging between research and practice.

I guess the fundamental difficulty I'm struggling with is that
programming at the level of explicit fences seems to me to be doomed and
a proven failure. Most of the objections I have above are around
requiring the programmer to know why and where to write explicit fences,
when in practice even experts have a bad track record in getting it
right. Am I being too pessimistic?

Herb