[cpp-threads] Asynchronous Function Proposal

Wed Jun 3 23:53:26 BST 2009

> From: Herb Sutter [mailto:hsutter at microsoft.com] 
> 
> Everything you say is true and important, but it doesn't 
> apply to async(). That's what I thought we were losing sight 
> of a bit here, so let me try to summarize the intended focus.
> 
> async() isn't meant to be The Answer (or even An Answer) for 
> nested parallelism, performance, or scalability. async() is 
> about just providing a single simple way to easily perform an 
> async call and get an async result, without all the 
> packaged_task boilerplate people have to write otherwise in 
> the current draft. It is mostly about concurrency rather than 
> parallelism, but there seems to me to be no reason to not 
> also permit running a task on a work stealing implementation, 
> and the only hook I've been pushing for to permit that is to 
> have the default be "might or might not run on this thread."
I no longer believe that the difference between "concurrency" and "parallelism" is well-defined in general.  But if I understand correctly, I think we have to be careful here.  "Concurrent" applications are likely to use other forms of synchronization, which tend to be dangerous in fork-join frameworks.  I think you probably want only the "always create a thread" variant for those.
> 
> That's all we're looking at for now. In time (TR2 or C++1x), 
> at the same time we consider things such as thread pools, we 
> know we probably do want to support more advanced uses 
> including nested parallelism and performance and scalability. 
> We already know that to do that we'd need to add lots of 
> other stuff, such as the task_groups PPL exposes and 
> performance knobs such as some TBB exposes. This async() 
> isn't meant to be that, that's all.
> 
I just want to be sure we're not designing a white elephant here.  If we include something that inherently works less well than Cilk or TBB (or Microsoft's library) for nested parallelism, I suspect that those alternatives will be used instead for most applications.  If that's the case, I think we either shouldn't bother at all, or limit ourselves to the simplest possible (always runs in another thread) variant.

It currently appears to me that:

- We have a pretty clear answer that we cannot handle a Cilk-style implementation without a lot more work; stealing callers effectively means that tasks have to migrate between threads, implying that you may lose your thread-locals, etc.  If I understand correctly, committing to not doing that is fine, since code has to be written somewhat differently in the two styles.  For example, a fib function that only spawns one of the subcomputations and runs the other inline is potentially parallel in Cilk, but not the others.  Thus a standard that left the execution method completely unspecified seems to give too little guidance anyway.

- We could possibly use an implementation like TBB or Doug's Java one?  I think this REQUIRES Lawrence's original formulation in which the decision to run a task in the same thread is postponed until the result is needed, so that it's available for stealing in the interim.  I don't see how to make this work if ths ubtask is run at the point of call.

- One difference between these two is that the latter probably requires the programmer to more explicitly manage granularity.  Cilk naturally tends to run small tasks near the leaves as function calls, minimizing overhead for small granularity.  It still seems to me that with the other alternatives are probably less robust against too small a granularity?

Hans