[cpp-threads] High-level concurrency issues

Tue Oct 18 15:00:23 BST 2005

Peter wrote:
> Herb Sutter wrote:
> > Or call a virtual function, or use a template parameter, or let
> > control flow through any other customization point into unknown
code.
> >
> > The general issue that calling any unknown code while holding a lock
> > is a recipe for deadlock. You really are required to know the
> > implementation details of all code that you might call while you are
> > holding a lock.
> 
> No, you are not. What you are required to do, though, is document
which
> external calls are made while an internal lock is held. This is simply
a
> part of the usual Concept/Requirements table for the customization
point.

We agree. To me that's equivalent knowing (by specifying and requiring)
the lock-related implementation details of all code that might be
called, transitively. :-)

> Good point about getHashCode, though; there is no reason to call it
under
> a lock, though I see I've done the same mistake in my hash map.
Thanks. :-)

You're welcome. It's incredibly easy to do. Everybody makes this
mistake, even when they are very experienced and know about the issue --
it takes great care (and frankly IMO an unreasonably high bar of
discipline) to do safe things under locks.

> > For example, imagine someone writes an object that lazily
initializes
> > itself from a file or database, and when the lazy initialization
gets
> > triggered it acquires a file or database lock to perform its
> > initialization. Now say you put an object of that type into a
> > synchronized container like a .NET SyncHashTable (a monitor), and
the
> > latter in particular will lock(this) and then call
object.GetHashCode
> > -- which can trigger a lazy read and acquire a lock. That's fine in
> > itself, but of course now you have a deadlock just waiting to
happen...
> 
> What alternative model can avoid this issue? I eliminate lock-free
from
> consideration because it's very difficult to write correct lock-free
code.

That is exactly the problem: We have no _general_ better answer today
for dealing with mutable shared state. Locks are the status quo, they
are the best solution we have, and they are seriously deficient.

Let me mention some alternatives. Jim and I mentioned most of these in
the Queue article, too:

First, I agree lock-free isn't the answer, and never will be for the
general case -- my usual sound bite is "lock-based programming is hard
for experts to get right, and lock-free programming is hard for geniuses
to get right." I can't see any way that lock-free programming will ever
be suitable for 99% of all the world's C++ developers, and there are lot
of experts near the top of that 99% that I still wouldn't trust (I
certainly wouldn't write lock-free code myself if I can help it). And
that's how hard it is just to write it correctly once -- I'm ignoring
the issue that lock-free code isn't close to portable, and always needs
to be reviewed when porting to a new target, and often even to new
versions of the original hardware architecture. And it's ignoring also
that few structures and algorithms are actually known to have lock-free
or low-lock implementations. I wouldn't teach this to people because it
might embolden them to try it (sorry, Andrei! :-) ).

Second, one way to reduce mutable shared state is to use immutable
state. Most modern string packages are based on immutable instances,
which are thread-safe by construction. (Alas, basic_string need not
apply, at least not without refit.)

Third, transactional memory might bear fruit. It is far from a solved
problem -- the elephant in the room is IO, and nobody has seriously
tried to deal with that. However, even if IO doesn't pan out, TM may
well pan out as useful even for dealing only with memory (plus maybe
transactable resources). But even that is far away -- I don't expect
anything general to become commercially available before the end of this
decade at earliest. And yes, I know partial commercial implementations
exist, and that more general experimental research implementations
exists (Microsoft has at least 5 or 6 internally, most of them publicly
acknowledged in MSR papers).

> An asynchronous hash table, where all requests are put on a queue and
the
> client code receives notifications when an operation completes and
returns
> a
> result? Doesn't this increase the complexity of the client code by
quite a
> factor?

:-) It does using today's models. It does much less so when you have
futures. For example, in Concur syntax:

  future<int> f = ht.Insert(obj);
    // assume ht is an active object, so this is an async call

  ... do async work until you need the result ...

  DoSomethingWith( f.wait() );
    // waits for f to materialize, at which point all side effects
    // of the async operation will be complete as well

Or with an immediate wait (this is the Concur/futures idiom for calling
an async function synchronously):

  ht.Insert(obj).wait();

Of course, you only need to wait if the caller actually needs the result
or the side effects. If the caller doesn't, it's just plain old:

  ht.Insert(obj);

with no blocking, fully concurrent by default.

This is something very interesting that is actually a part of what I've
been working on (as you can see from the Concur examples). The main
issues with this doing this in general for types like containers are: a)
efficiency; and b) that, much like exception safety, this approach can
affect the container's design.

BTW, I know I'm alluding to Concur without explaining what it is -- I
only showed it publicly for the first time last month, so it's new to
the world. The basic summary is that Concur aims to:

  define higher-level abstractions (above "threads+locks")

  for today's imperative languages (first C++, but applicable to others)

  that evenly support the range of concurrency granularities
    (e.g., coarse-grained out-of-process, long-lived in-process,
    loop/data parallel, all with a similar programming model)

  to let developers write correct and efficient concurrent apps
    (that they can reason about easily and that is toolable)

  with lots of latent parallelism (and not lots of latent bugs)

  mapped to the user's hardware to reenable the free lunch (exes that
run
    well on 1- and 2-core, better on 8-core, better still on 64-core,
etc.).

I can't easily describe more than that in email. I could show more about
it sometime when we could get together. No papers yet.

Herb