[cpp-threads] High-level concurrency issues

Tue Oct 18 14:30:30 BST 2005

Say, Nick, are you the same Nick I met in Mont-Tremblant, who gave the
GC presentation?

> If you are misunderstanding, I am, too.  ALL you have to do to create
> a synchronous protocol out of an asynchronous one is to encapsulate
> in the following way:
> 
> Synchronous action:
>     Start asynchronous action
>     Wait for it to complete
> 
> Yes, it really IS that trivial.

Right. Usually the main question about this is not whether it's
error-prone or difficult (as it seemed Peter suggested), which it isn't.
Rather the usual question is whether it's efficient enough: How big does
a work item have to be before you lose more from the async overhead than
you gain from the parallelism (and this matters whether you immediately
wait or not, but matters the most in cases like the above where you do
immediately wait).

In particular, everybody in this space works on how to get the overhead
down low enough to make it practical for even fairly short operations
(i.e., does a work item have to be fairly coarse-grained at 100,000
instructions, or 10,000, or reasonably fine-grained at 1,000...). Heck,
I'm currently pushing on my OS guys to do work to make this cheaper for
my Concur language extensions (I'll be giving a talk about these at C++
Connections), and I'm sure so is everybody else.

Just for completeness, I should mention a pitfall in trying too hard to
get the async overhead down: It's often suggested that in cases like the
above, where you know the caller is going to do minimal or no parallel
work and just wait (or any work the caller does is complete before the
work item begins to run on another thread), you could "inline" the work
item by simply running on the originating thread. This has enticing
benefits, notably to minimize overhead (no cache sloshing, in some cases
even no queueing). But the pitfall is that it does change the semantics
in a subtle way -- one very desirable feature (and arguably tied for
first place as the most important feature) of running a work item asyc
is to get out from under any locks that the calling thread holds. After
all, the more we can do work outside locks to reduce the chances of
deadlock, the better! But scheduling the work item on the caller's
thread for efficiency doesn't preserve those semantics. So this
optimization is actually only appropriate for a work item that has no
off-thread affinity semantics requirements (i.e., where the programmer
isn't expecting those semantics and the work item isn't harmed by
running under any locks the caller might be holding, including that the
work item is guaranteed not to try to acquire any locks of its own).

Herb