[cpp-threads] Asynchronous Function Proposal

Tue Jun 2 23:58:04 BST 2009

> I think this is getting a lot closer to something we might want.  And
> I'm increasingly convinced that we need to have this design somewhat
> nailed down before we finalize some features already in the draft.
> This discussion seems essential whether or not we decide to put this in
> C++0x.  I really don't think we want the current future in the standard
> if we can't use it here.

Agreed.

> But I'm still worried about whether we have enough of the details right
> here that this facility will actually be usable.  I don't have the
> necessary experience that I feel I can evaluate a design.  As far as I
> can tell:
> 
> - I believe this doesn't support a Cilk-like (or TBB-like?)
> implementation.  I believe Cilk effectively allows colder call frames
> to be stolen by another thread, for example.
[...]

A primary reason I've been advocating an "async()" facility, besides as a way to make async calls easy, was specifically to make sure that an async() with a default "could execute on this or another thread" policy could be implemented to run on a work stealing implementation, such as Cilk, TBB, or our own TPL/PPL/ConcRT. I haven't seen anything that would prevent that -- all you need to say is exactly "could execute on this or another thread" and IIUC that's all we need.

> - This is very different from Java thread pools, as it has to be, given
> the constraints.

Yes.

> - I don't have a clear intuition of how this could perform even with a
> simple tree-structured algorithm, like the ubiquitous fib example.
> Assuming an implementation that creates a thread 

Speaking to that assumption: I think that's one implementation, but it's not what my company would probably ship. We'd just have code like

  unique_future<T> result = async( [&]{ quicksort( /* left subrange */ ); } )
  ...
  result.get();

be basically the same as what in PPL (available in VS 2010 beta) we would write as

  task_group tg;
  tg.run( [&]{ quicksort( /* left subrange */ ); } );
  ...
  tg.wait();

(note that we still need to add futures to our system to make it more usable :-) ).

> when the number of
> threads is below a certain threshold, I would expect that depending on
> exact timing, we actually get a bunch of threads executing high-level
> tasks (if newly created threads start running quickly), or most of the
> available threads are initially assigned to small tasks low in the tree
> (if the initial task gets to spawn all the threads before anybody else
> gets a shot at it).  My intuition is that the former would result in
> acceptable performance, where the latter would be awful.  But the only
> way I seem to be able to control this is by avoiding the may_be_called
> policy.  Maybe this isn't as bad as I think it might be.  I just don't
> know.  Doug Lea or Arch Robison probably understands this better.  Is a
> better implementation possible?

Does the above help? Are there concerns that I'm missing?

> It seems to me that we're designing something here that's quite
> different from Cilk, TBB, or java.util.concurrent, which are perhaps
> the most established (?) systems in the area.

I don't think so, and I hope not. Put slightly differently: I think and hope we're designing something that could be trivially run on a work-stealing implementation if available, or on pooled or non-pooled threads otherwise.

Herb