[cpp-threads] Asynchronous Execution Issues (caching_async)

Wed Apr 29 08:45:01 BST 2009

At Wed 29 Apr 2009 05:13:33 BST, "Robison, Arch"  
<arch.robison at intel.com> wrote:

> The experience with TBB strongly supports Herb's point: users need   
> both. For the first few years of TBB, we offered only work stealing,  
>  and users kept asking us for true asynchrony too.  (Now TBB offers   
> the latter via an approximation of std::thread).
>
> A key lesson from Cilk is that it's not just about granularity.  The  
>  choice of schedule can profoundly affect the asymptotic space   
> behavior of a program.  For example, Cilk-style scheduling often   
> delivers space bounds linear in the number of processors, whereas   
> the equivalent code using true asynchrony can have exponential space  
>  blowup.  (Think depth-first versus breadth-first.)
>
> Worse, "both" is probably not enough.  Cilk-style scheduling is   
> great for a certain class of problems, but there are other problems   
> (e.g. stream processing) where FIFO-like scheduling makes more   
> sense.  Beman Dawes suggestion of a more generic approach is worth   
> investigating.
>
> To make non-preemptive schedulers useful, users have to understand   
> their limitations, particularly with respect to deadlock.  For   
> example, it's overly simplistic to require that one caching_async()   
> call never wait on another.  At a minimum, I want a function being   
> executed by a cached_async() call to be able to execute another   
> cached_async() call and wait on the result.  Invariably users want   
> to do more complex work graphs.
>
> Thus I'm skeptical that caching_async can be both widely useful and   
> avoid the hard issues of a thread pool.

It is for this reason that the BSI comment was worded as it was:

"Provide a simple function along the lines of:

template< typename F, typename ... Args > requires Callable< F,  
Args... > future< Callable::result_type > async( F&& f, Args && ... );

Semantics are similar to creating a thread object with a packaged_task  
invoking f with forward<Args>(args...) but details are left  
unspecified to allow different scheduling and thread spawning  
implementations. It is unspecified whether a task submitted to async  
is run on its own thread or a thread previously used for another async  
task. If a call to async succeeds, it shall be safe to wait for it  
from any thread. The state of thread_local variables shall be  
preserved during async calls. No two incomplete async tasks shall see  
the same value of this_thread::get_id(). [Note: this effectively  
forces new tasks to be run on a new thread, or a fixed-size pool with  
no queue. If the library is unable to spawn a new thread or there are  
no free worker threads then the async call should fail.]   "

This is deliberately basic. Aside from thread_locals it has the same  
semantics as spawning a new thread every time, but it is allowed to  
fail due to "too many threads", and may reuse old async threads once  
the original task has finished.

In general, I've seen three different models for dividing work:

1) Discrete long-running tasks which communicate e.g. gui+background  
thread. These are often well served by std::thread, but if they return  
a result it might be good to have an async function that can run them  
and return a future for the result. This really requires that each  
function is running on a separate thread.

2) Many independent tasks that can run in parallel. These are ideally  
suited to a work-stealing thread pool as there is no communication  
(except maybe short-lived locks) and no dependent waiting.

3) A graph of inter-dependent tasks that can mostly be run in  
parallel, but which have dependencies or communicate with each other.  
A simple work-stealing pool can deadlock on such tasks, particularly  
if they are nested on the same thread. In order to run such tasks on a  
pool you need to provide dependency information to the pool. Things  
like task groups can provide such information. Running each task on a  
separate thread should "work", but will likely have poor performance  
due to oversubscription, thread overheads and so forth.

A proper set of support for #3 is clearly outside the remit of what  
we're doing at the moment. Support for #1 or #2 is doable, but they  
require different thread properties --- #1 requires separate threads,  
#2 works better with a limited pool of threads. If we provide support  
for #1 or #2, I can see people trying to use it for #3.

The BSI suggestion handles #1 and provides working but obviously poor  
support for #3. Likewise it will provide working but obviously poor  
support for #2.

If we provide good support for #2 this does not work for #1 or #3, as  
the tasks may not run when required. User code may deadlock, and this  
deadlock may depend on the number of available threads in the pool  
e.g. with 4 threads everything is fine, but with 3 or 5 you get  
deadlock as the scheduling properties change.

It might be an option to support #1 and #2, which gets us back to  
creating_async vs caching_async. It would be nice if we could do this  
without making it hard to support #3 later in the same framework,  
which leads us to Beman's generic async proposal with the properties  
specified by an extra parameter. This is particularly good if we take  
into account the issue that Arch noted above about different code  
requiring different scheduling properties.

unique_future<retval> f1=async(std::own_thread,gui_func); //#1
unique_future<retval> f2=async(std::pool_thread,parallel_func); //#2
unique_future<retval> f3=async(my_task_group,dependent_parallel_func); //#3
unique_future<retval> f3=async(fifo_pool,dependent_parallel_func); //#3

I think the thread_local issue is actually a distraction. It is always  
possible to write classes which have destructors with behaviour that  
is undesirable when the class is used as a thread_local. I think it  
might be useful to have join() only return when the thread function  
has completed AND all the thread_locals have been destroyed, but this  
is not an issue for async. If you use async you do not know the  
lifetime of thread_locals. They may exist before your task starts, and  
may continue to exist after it finishes.

Anthony
-- 
Anthony Williams
Author of C++ Concurrency in Action | http://www.manning.com/williams
just::thread C++0x thread library   | http://www.stdthread.co.uk
Custom Software Development         | http://www.justsoftwaresolutions.co.uk
Just Software Solutions Ltd, Registered in England, Company Number 5478976.
Registered Office: 15 Carrallack Mews, St Just, Cornwall, TR19 7UL, UK