[cpp-threads] Asynchronous Function Proposal

Sat Jun 20 00:51:53 BST 2009

Yes, let's first converge on the programming model.

Lawrence wrote:
> async is not a replacement for OpenMP.
[...]
> Be wary of trying to use this facility for massive parallelism,
> it is a stop-gap mechanism for few-core systems until TR2 becomes
> available.

Agreed! More next:

> The programming model is:
>    At the highest levels of the program, add async where appropriate.
>    If enough concurrency has not been achieved, move down a layer.
>    Repeat until you achieve the desired core utilization.

Not agreed -- the above seems to be talking about 'using more cores to get the answer faster,' aka "parallelism" or Pillar 2 in < http://www.ddj.com/cpp/200001985 >. That's the space targeted by tools like OpenMP and work stealing and (common uses of) thread pools etc., and patterns like fork-join.

The programming model for async is different:

   Instead of a synchronous function call that returns a synchronous value,
   make an asynchronous function call that returns an asynchronous value.

This is talking about 'I want to do work (potentially) asynchronously, decoupled from this thread,' aka "concurrency" or Pillar 1 in < http://www.ddj.com/cpp/200001985 >. That's the space targeted by tools like OpenMPI and threads and (some uses of) thread pools etc., and patterns like pipelining. It's mostly not about using more cores at all, although it just so happens that when you express asynchronous work you may also be expressing asynchronous compute-bound work that can keep more cores busy at once, as with pipelining again as a specific example; but this is not about scaling work to saturate a manycore machine.

> A difference in the anticpated programming model may be at the root
> of the lack of consensus.

As I put it in my draft paper:

  - That is, just as we can make a synchronous function call that returns a
  - synchronous result:
  - 
  -     T t  =  f();
  -
  -   we want to be able to make an asynchronous function call that returns
  -   an asynchronous result [...] :
  -
  -     future<T> t  =  async([]{  f();  });

That really is all. Do we agree on that motivation?

Finally, without violating the motivation above, it's consistent to want the option of running the async() task on a pool, or on a work stealing implementation if the user opts in via "either"/"may-be-called," because of:

  a) efficiency

  b) not interfering with load balancing done by thread pools et al.

Which brings us to:

> > There's another key problem that I don't think I mentioned
> > explicitly before, which is oversubscription: A big reason
> > to be able to run the async task on a thread pool is that
> > the pool is already in the business of staying "rightsized"
> > for the machine. In an application that uses today's thread
> > pools, having compute-intensive work apart from the thread pool
> > penalizes performance because it makes it harder for the pool to
> > accurately match ready work to available cores and oversubscribes
> > the machine. So that's another key reason that an efficient
> > implementation needs to be able to run the work on a pool
> > especially when the application is using that pool anyway.
> 
> I agree that async must avoid the over-subscription problem.
> However, thread pools are not necessary to avoid the problem.
> In particular, a count of active threads, compared against
> std::thread::hardware_concurrency() can provide all the information
> necessary to determine if new thread should be created.

No, that's not at all what I'm saying. I'm saying that an "in a new thread" async specification doesn't play nice with thread pools, and therefore doesn't play nice with apps that *do already* (or will choose to) use thread pools for their compute-intensive work.

Let me try to say it again with maybe slightly clearer phrasing:

Applications that run their compute-intensive work on a thread pool really want all their compute-intensive work to run on the pool. The pool is already in the business of staying "rightsized" for the machine, and having compute-intensive work outside the thread pool interferes with the pool's ability to accurately match the number of ready threads to the available hardware. Each compute-intensive async task in a non-pool thread adds extra work that the thread pool doesn't know about and so results in oversubscribing the machine, providing more ready work than there is available hardware parallelism.

If we mandate "in a new thread," then it will probably be unusable in practice for any compute-intensive task in an application that is using a thread pool to spread its compute-intensive work across the available hardware.

Herb