[cpp-threads] High-level concurrency issues

Wed Oct 26 22:21:28 BST 2005

   Date: Mon, 17 Oct 2005 08:29:31 -0700
   From: "Herb Sutter" <hsutter at microsoft.com>

   Hopefully we can get together sometime with a whiteboard handy -- it's hard
   to have these conversations in email alone. :-) Maybe at C++ Connections?

I agree, but I'll put down a few thoughts for others because I think there are
some misconceptions about concurrency models and what I said in my paper. Yes,
I'm hoping to talk to you and others at C++ Connections, and it should involve
beer and a whiteboard of almost infinite size. ;-)

   Just some quick feedback, here's one basic point I don't agree with: You
   rule out asynchronous calls as being interesting

I quote from my paper "Both synchronous and asynchronous communication are
needed in a concurrent system." so clearly I think asynchronous call is
interesting.  To explain some of my comments in the paper, I need the following
simple concurrency model:

      client side      communication      server side
      -----------      -------------      -----------
       thread(s) <---> buffer(s) <---> thread(s)/buffer(s)

If a concurrency model can't paint this picture, then I'll argue the model is
inadequate.  Once this model can be constructed, you can decorate it with many
additional styles and motifs, as long as a user can adequately get access to
the basic model.

Some definitions:

  sync-call : a client thread calls a server thread to perform an operation and
  possibly return a result. The client is going to use the result or action
  performed by the server immediately so it waits for the call to complete.
  This scenario occurs a *LOT* in client code.  Because the client waits, no
  buffering of arguments is needed to perform the call and no protocol is
  needed to return a result. The advantage is a simple mechanism to express the
  call/return, which matches with normal routine call in C++, and no buffering
  of the caller's arguments.

  async-call : client does a sync-call to store its arguments into a buffer so
  it can continue execution concurrently with the server. The server retrieves
  the arguments from a buffer, performs an operation on the arguments, and
  through some mechanism may return a result to the client.  The advantage is
  the potential overlap in execution between the client and server, which
  assumes there is sufficient latency between the start of the call and
  acquiring the return value to justify any overhead costs.  Note, the
  buffering of arguments and return values can be done in a variety of ways,
  such as implicit buffers created by the language or explicit buffers created
  by the client and/or server.

For both calls, some locking has to occur, and the server is free to handle
other requests while performing client operations or farm the operations to
worker threads, i.e., the server is not required to handle client requests in
FIFO order, which may require additional buffering in the server.

That's it folks. There is no other special magic about either sync or async
call. You can build a lot of specialized concurrent patterns on top of this
model (design-pattern fodder), but this model is the core/root pattern
explaining the fundamental concepts.

Now I want to look more closely at async-call. Herb and Nick say:

   Date: Tue, 18 Oct 2005 12:03:54 +0100
   From: Nick Maclaren <nmm1 at cus.cam.ac.uk>

   > It's been clear to me that asynchronous (aka nonblocking, message-passing)
   > has to be the default when communicating between threads. Certainly
   > Lawrence at least seems to agree that futures are a key abstraction.

   Absolutely.  It is trivial to build synchronous protocols out of
   asynchronous ones and provably impossible to do the converse.

If you make async-call the default between threads, that means paying for the
cost of buffering arguments even for the sync-call case.  In the spirit of
C/C++, you should only pay for things you use. This cost is magnified because,
believe it or not, the reason most clients call a server is because they want
to process the result immediately. If a programmer knows thread calls are
async, they can try to restructure the client code to start as many calls as
possible near the beginning of a block, do some work, and then wait for the
results; but this reorganization can often significantly obscure the client's
code.  (Compilers do this optimization by moving loads forward to achieve
prefetching, making it impossible to read the generated code).

Now I also say "synchronous communication can implement asynchronous and vice
versa". Nick only believes in async to sync. However, I have explained above
how to implement async-call using sync-call and I have done this many times
over the years, proving that it is possible. So you will have to show me your
proof that async-call cannot be constructed from sync-call.

The conclusion in my paper is that the default communication mechanism between
threads should be sync-call because it occurs often and should not be charged
any unnecessary overhead. My concurrency model provides an efficient mechanism
to construct async-call from sync-call for users that want/need it.  Here I'm
arguing for a library approach for async-call versus a language mechanism.  But
I can see situations where a library approach for async-call is inadequate;
hence, I'm open to making it part of the language. But I need to see some
examples and do some performance testing first.

   >                        Building synchronous communication out of
   >   asynchronous mechanisms requires a protocol for the caller to
   >   subsequently detect completion. This is error prone because the
   >   caller may not obey the protocol (e.g. never retrieve a result).
   > 
   > The above is actually pretty trivial in the implementation (e.g., think
   > of a future<T> containing just a shared_ptr<FutureData<T>>). In the
   > programming model it's easy too: The caller doesn't create a future for
   > the result at all, or creates one but then lets it go away without
   > touching it to get the result. Maybe I'm misunderstanding. Probably
   > whiteboard material.

   If you are misunderstanding, I am, too.  ALL you have to do to create
   a synchronous protocol out of an asynchronous one is to encapsulate
   in the following way:

   Synchronous action:
       Start asynchronous action
       Wait for it to complete

   Yes, it really IS that trivial.

Well, what you have written is correct and trivial but only because you have
ignored the devil in the details.

1. You say "Wait for it to complete", but tell me more. You have to wait on
   something; there is no magic. Once you define the "something", it is no
   longer trivial. Are you going to spin waiting for a bit to be set or use a
   lock? No matter what you choose, you have to manage the "something" and make
   it available to both client and server, and you also have to pay for any
   associated costs for the async call, even though you gain nothing from it.

2. The above simulation introduces the notion of a protocol between client and
   server. What happens if a client violates this protocol? A client can forget
   to wait, or attempt to wait multiple times, etc. How do protocol violations 
   affect the server? You can't just have a client-centric view of concurrency.

Clearly, if a client does not need a return value, the protocol problem
disappears, but the majority of calls return a result. Conceptually, for each
call returning a result, a server immediately returns a ticket that the client
can use to subsequently obtain the result. In the trivial case where a server
is only handling one client or servicing requests in FIFO order, the ticket can
be avoided. However, this organization is a very inefficient way to use a
server.  Hence, a protocol is a fundamental aspect of async-call returning a
result from a non-trivial server. There are no options here; this is a
fundamental requirement.

Now the protocol and its specific implementation can take many forms (more
design-pattern fodder). One of these forms is the "future" (developed in Lisp),
which abstracts and encapsulates the protocol so a user does not have to deal
with the protocol implementation and cannot violate it. Notice, a "future" has
nothing to do with the concurrency of the async-call; a "future" is solely a
software-engineering device. I think a good concurrency model should be able to
implement a variety of async-call approaches and their associated protocols
(e.g., tickets, call-back, futures, etc.).

   I mention the sync/async issue only because it seems to be pretty
   fundamental to your approach, and has flavored many of the details and
   results. In particular, you reject case 7 which seems to be the category
   that would include an active task that is communicated with asynchronously
   only, i.e., every method call is a message. This design does not need any
   form of locking or other synchronization (the active object's internal
   structures are only used by its own thread, though I'll add "conceptually"
   here because we need to get away from explicit threads), and so does not
   fall into what case 8 appears to describe where uMutex is implicit. Did I
   understand your categories correctly, or is case 8 already nonblocking for
   callers (if so, that's wonderful)?

Sync/async call is sort of orthogonal to the table, where async can only apply
to rows with a thread. So, yes, case 7 does remove any implicit locking as part
of a call so it does match nicely with async-call. But an async call still
needs to lock its argument buffer, which has multiple clients inserting and the
server removing, simultaneously. So locking for communication is explicit in
the async call (even if you make it implicit in the language). I'm happy to put
case 7 back into uC++ if some async-mechanism is introduced to make it useful.
In fact, it is probably more work to prevent case 7 than to allow it. (I've
made the same argument to Hans about coroutines. The way uC++ is designed, it
would be more work to prevent coroutines than to allow them.)

Now I have a confession to make: like Java, the mutex property in uC++ can be
turned off for individual members. Hence, case 7 can be supported "explicitly"
by putting "_Nomutex" as a member qualifier. This trick is used to provide a
form of async-call, where clients calling nomutex members have no implicit
locking and can directly insert their arguments into a single or multiple
buffers (composed using explicit locks or monitors) defined in the server. The
server thread can process these requests and use any protocol you can program
(including futures) to return results. So this is one way to implement
async-call in uC++ based on a sync-call. If a client wants async-call and the
server does not provide it internally, a different approach is necessary and it
is more expensive.

Nevertheless, async-call and its protocol management only addresses client-side
issues.  The server-side must also be addressed. One of the reasons that Jim
and I keep mentioning the language that starts with "A" is because it actively
addresses server-side issues with a programming model based on the notion of
active objects (task). (So do other concurrent languages with active objects.)
Without this notion and its associated concepts, it is difficult or impossible
to build powerful server-side concurrency, like the Administrator model. If you
have only done concurrency in Java or C#, then it is difficult to understand or
appreciate the capabilities of active objects for developing complex servers.
As Jim has pointed out, both the notion of a monitor and task are useful, in
general, and especially useful on the server side.

   Well, message queues are simple and well understood. Lots of libraries and
   systems for this exist. (Also, while garbage collection isn't necessary for
   this, I should note that GC is also being proposed for C++; GC can make some
   of the implementation choices easier.)

True, but none of these models fit with any existing notion in C++. The more
notions you put into the language, the more difficult it is to program. I'm
suggesting extensions to existing notions whenever possible to provide new
capabilities that simplify programming.

Finally, Herb says:

   I don't buy into that part, because that makes blocking the default, and
   implicit blocking is one of the big problems with today's concurrent
   programming models that we need to overcome.

and in a number of followup messages Herb pounds home this idea. Unfortunately,
I'm not getting the point of it all. Locking and blocking are fundamental to
concurrency. The notion of atomicity cannot be removed from concurrency, and
mutual exclusion, which provides atomicity, requires locking involving
spinning/blocking.  Clearly, locking/blocking are complex and cause problems; I
am in total agreement with this part of the argument. But unless you can show
me a form of concurrency without atomicity, I don't understand the point. There
is nothing magic about lock-free or transactional memory; all the problems are
still there in some form or another. You can't sweep locking under the carpet
and say all problems are solved. Basically, I don't think denial is going to
get you anywhere.

   But if you mean that 'doing the right locking' is the answer, I would
   disagree with that because locks are known to be inherently
   inappropriate for building modern software -- they aren't composable.
   One of the key things we need to do in a next-generation programming
   model for concurrency is to get better isolation to reduce or eliminate
   mutable shared state (and thereby reduce or eliminate the need for
   locks), just to get back composability.

Again, I'm not getting the point. Lots of things are not composable. I can't
compose a Honda using Toyota parts; I can't even reuse Honda parts on a
Honda. I think reusability is a good parallel issue. For many reasons, people
do not write reusable code even when a language provides mechanisms to do
it. But if you want to spend the extra time and effort, you can write very
reusable code. Similarly, if you take some time and effort, you can write some
composable concurrent software, up to some fundamental limits.  But you can't
magically solve problems like deadlock or wish away mutable shared state. If we
can figure out ways to mitigate these issues, that's great, but they are always
going to be there. It's the nature of the beast.