[cpp-threads] Asynchronous Execution Issues

Sat Apr 25 00:52:47 BST 2009

Lawrence Crowl wrote:
> On 4/24/09, Bjarne Stroustrup <bs at cs.tamu.edu> wrote:
>   
>> Lawrence Crowl wrote:
>>
>>     
>>> At the Summit meeting I took the action item of producing a proposal
>>> for a simple asynchronous execution.  The idea was to facilities,
>>> with an API along the lines of:
>>>
>>>    auto x = std::creating_async( function1 );
>>>    auto y = std::caching_async( function2 );
>>>    ....
>>>    auto a = x.get();
>>>    auto b = y.get();
>>>
>>> The difference between the two is that creating_async would always
>>> create a new thread, while the caching_async is permitted to reuse
>>> threads.  The primary operational difference is that thread-local
>>> storage is reused in the latter case.
>>>
>>>
>>>       
>> This is great! I was about to write something along those
>> lines myself (thinking that I was among those such charged -
>> and seriously behind with my work as usual)
>>     
>
> Well you asked to review my work early, so you were a little on
> the hook.  :-)
>
>   
>> Naming is a big deal: I don't think those two names are good.
>>     
>
> Fully agreed, but I'd rather start with a clearly bad name than a
> subtly bad name.  Please send any suggestions that you might have.
>
>   
>> Also, I don't think the key issue is/was whether a thread might
>> be reused, but whether a task might be executed on the thread
>> that launched it. In that spirit, I propose the names:
>>    async()   // execute somewhere
>>    async_thread()  // execute on a thread different from that of the launcher
>>     
>
> The concurrency subgroup didn't take this view.  The idea was that
> the work should be able to include arbitrary synchronization, and
> if the caller and the work are serialized, they cannot synchronize.
>   

There were few people present and fewer spoke. I think it would be wise 
not to interpret what occurred as more than a consensus that some form 
of async() was considered desirable. My "chat" is corridors, email, etc. 
convinced me that we would not be able to build a consensus for less 
than two async()s: the one that guaranteed to run on a separate thread 
(so that you could imagine the two tasks communicating) and the one that 
did not offer that guarantee (for the simplest tasks, potentially 
executed on the launching tasks if it has nothing better to do). I'd not 
like to see the former split further into two (new thread vs. 
potentially re-cycled thread) and I can't see how that could be done 
without exposing the underlying thread/thread-pool machinery (which we 
decided not to include in C++0x).

> I agree that a facility to execute the work on the current thread
> would be helpful, particularly in avoiding oversubscription, but
> it does impose restrictions on the synchronization in the work.
>
>   

I think we must ask: What's the purpose of async()? My answer is (I 
think) clear and not necessarily identical to that of anyone else, so 
let me articulate it:

The purpose of async() - as for future - is to allow people to use 
concurrency without having to deal directly with underlying systems 
facilities, such as task and locks. Instead, futures together with 
async() allow the user to think of asynchronously executed tasks 
returning values through futures.

Anything that exposes the underlying machinery is most undesirable (in 
the context of async()). Any tasks that requires understanding of 
underlying concepts (such as locks and priorities) will use async() only 
incidentally. People with such tasks will have to construct "heavier 
machinery" themselves or wait for C++1x. Part of the purpose of async() 
is to be simple. A "full featured powerful" async() would be a 
contradiction, though we might in the future, when we have agreed on 
what a thread-pool is and probably on several other things, we will want 
a  "full featured powerful" launcher. However, that launcher is not async().

Note that I consider async() - together with future - a "layer" on top 
of the layer of (necessary, but necessarily complex and ugly) system 
level facilities. I do *not* see async() as a simple part of those 
facilities (as  the - very reasonable - packaged_task facility is/were).

>> (is it async() of asynch() and why?)
>>     
>
> It is async because without the trailing 'ronous', the ch would be
> pronounced as in church.  English spelling is not ideal.
>   

Thanks (it's your language :-)

>   
>>> There are a couple of issues that make the task more difficult than
>>> we thought at the time.  As a result, I have a couple of questions.
>>>
>>> First, consider the case of the caching_async.  Because the threads
>>> may persist, we need a handle on the list of threads so that we
>>> can inform a thread to die so that we destroy any thread-local
>>> variables before the global variables that they reference.  So, we
>>> start needing a manager object, and the whole facility is starting
>>> to look too much like a thread pool.
>>>       
>> I'm not convinced that's necessary. It's up to whoever implements
>> async_thread() to
>>    (1) make sure that a task gets a "clean" thread (with no
>>    information from previous tasks in local variables)
>>    (2) a terminal error on an asynch_tread()ed tread is reflected
>>    in its future and the thread is properly killed or recycled.
>> I think that exposing any details of how/if/when that is done is
>> against the spirit of asynch.
>>     
>
> I am not worried about OS-level caching of threads (and many will
> anyway) but specifically on the behavior of thread-local variables
> as a result of calling the async function.  If the variables are
> always created and destroyed, that simplifies the problem.
>   

That is my model and I think the only model that works (in the sense of 
not exposing the underlying machinery).

>   
>>> I don't feel as though I have a mandate to propose anything
>>> that looks like a thread pool.  Do you agree?
>>>       
>> Agreed.
>>
>>
>>     
>>> Second, consider the case of the creating_async.  The problem here
>>> is a touch more subtle.  In particular, once the working thread has
>>> set its promise, it can start destroying thread-local variables.
>>>       
>> You are making too many assumptions here. My mental model is
>> that the tread is clean when a task starts executing. It couldn't
>> possible know about the local variables of a previous task.
>>     
>
> I'm worried about the thread-local variables of the new thread that
> does the work.  For instance,
>
>     thread_local complicated helper;
>
>     auto f = async( []{ return foo( &helper ); } );
>     ....
>     auto x = f.get();
>
> Running the work in another thread necessarily initializes a helper
> in that thread.  Suppose the destructor of complicated uses a global
> variable.  When will that use occur?  In short, we have no idea.
>
> More specifically, we know that the return happens before the get,
> but have no synchronization between the destruction of helper and
> the call of the code.
>   

"just say no".

Don't pass pointers to thread local storage to tasks supposedly simple 
enough for async(). At best, that would be what I referred to as 
"incidental use of async()," that is, here async() is used as a building 
brick for a more complex concurrency system that exposes complexity. 
This is not what async() is for and we should design it with emphasis on 
what it should be good at.

Destroy that "helper" whenever it would have been destroyed has you not 
launched anything; don't try to "garbage collect" references to pointers 
passed to tasks using async().

>   
>>> Unfortunately, we have no idea when that has or might happen.
>>> That thread could be stalled immediately after providing the value.
>>> The solution is to wait for thread termination before returning the
>>> value from the future.  This solution implies keeping the thread as
>>> part of the future so that we can wait.  We have no such mechanism.
>>> So, an asynchronous execution function needs another kind of future.
>>> Is such a future going beyond my mandate?
>>>       
>> Sorry to be so brief, but I am just finishing the semester, etc.,
>> and I think that the solution to these problems is to specify
>> simply what must be done and carefully avoid saying how it is done.
>>     
>
> The futures we have now do not have the necessary synchronization.
> I can add a future, but if adding another future type would kill
> the proposal, I'd rather not do the work to define it.  This mail
> is a straw poll to see if I should do the work.
>
>   

"This mail is a straw poll". I don't even have an idea who receives this 
message.