[cpp-threads] Asynchronous Function Proposal

Herb Sutter hsutter at microsoft.com
Wed Jun 3 15:47:45 BST 2009


P.S.: Just to add info for completeness, our PPL product now in beta also does "steal child", and you can program "steal parent" via a separate agents library.


> -----Original Message-----
> From: cpp-threads-bounces at decadentplace.org.uk [mailto:cpp-threads-
> bounces at decadentplace.org.uk] On Behalf Of Robison, Arch
> Sent: Tuesday, June 02, 2009 10:22 PM
> To: Boehm, Hans; C++ threads standardisation; Lawrence Crowl; Bjarne
> Stroustrup
> Cc: Pablo Halpern
> Subject: Re: [cpp-threads] Asynchronous Function Proposal
> 
> Instead of parallel_sum, the usual parallel naive Fibonacci recurrence
> would be better than parallel_sum.  On typical machines, parallel_sum
> runs out of bandwidth.  Native parallel Fibonacci makes a good
> benchmark for recursive tasking because it is not sensitive to machine
> bandwidth and generates unbalanced trees of work.
> 
> - Arch
> 
> -----Original Message-----
> From: Robison, Arch
> Sent: Tuesday, June 02, 2009 8:09 PM
> To: 'Boehm, Hans'; C++ threads standardisation; Lawrence Crowl; Bjarne
> Stroustrup
> Cc: Pablo Halpern; Doug Lea
> Subject: RE: Asynchronous Function Proposal
> 
> You have to pick your poison.  Cilk and TBB make different choices on
> that.  I call them "steal parent" versus "steal child".
> 
> In Cilk, spawning a child task causes the current thread to immediate
> execute the child, and the parent is put in a suspended state where it
> can be stolen by an idle thread.
> 
> In TBB, spawning a child task puts the child in a suspended state where
> it can be stolen by another thread, and the parent continues execution.
> TBB can simulate Cilk "steal parent if the programmer writes in
> explicit continuation-passing style, which is not easy.
> 
> The big wins from "steal parent" in Cilk are:
> * Threads never wait for completion of tasks, because the thread that
> finishes the last child picks up execution of the joining parent.
> * Total space is (number of threads)*(space for sequential execution)
> The big drawback of Cilk is that is requires a non-standard calling
> convention that has cactus stacks and decouples stacks from threads.
> As Hans points out, "steal parent" causes weird things to happen with
> existing implementations of thread-local storage.
> 
> TBB's "steal child" avoids the drawbacks.  Thread local storage works
> and standard calling conventions can be employed.  But this comes at
> the expense of not having the Cilk guarantees on space and incurring
> waiting.  TBB uses heuristics to control space demands, and the waiting
> sometimes artificially constraints parallelism.  These heuristics
> usually work, but like any heuristics, they occasionally bite us.  TBB
> also sometimes launches tasks on threads that are executing a waiting
> task, which helps with efficiency in common cases, but has its own
> risks.  There's no free lunch here.
> 
> I suggest that for anyone prototyping the proposed asynchronous call
> functionality, that they try the parallel_sum example over 10^9 values.
> That yields a recursive tree of about 10^6 nodes, with a height of 20.
> With Cilk, the space demands will be on the order of about P*(20
> nodes).  In TBB without the heuristics, it could go quadratic in the
> tree depth, e.g. about 400 nodes.  The TBB heuristics keep it down to
> P*(20 nodes), but by sometimes limiting parallelism.  With any
> prototype of the proposal, I would want to know how much space does it
> consume, and how well does it scale?
> 
> It's unfortunate that we are stuck with calling conventions that arose
> in a sequential milieu.
> 
> - Arch
> 
> -----Original Message-----
> From: Boehm, Hans [mailto:hans.boehm at hp.com]
> Sent: Tuesday, June 02, 2009 7:08 PM
> To: C++ threads standardisation; Lawrence Crowl; Bjarne Stroustrup
> Cc: Pablo Halpern; Robison, Arch; Doug Lea
> Subject: RE: Asynchronous Function Proposal
> 
> [Adding Pablo, Arch, and Doug:  This is a continuing discussion of
> Lawrence Crowl's proposal for a simple C++ facility to support
> asynchronous function call execution.  Reattaching Lawrence's proposal
> for newcomers.]
> 
> > -----Original Message-----
> > From: Herb Sutter
> >
> > > But I'm still worried about whether we have enough of the details
> > > right here that this facility will actually be usable.  I
> > don't have
> > > the necessary experience that I feel I can evaluate a
> > design.  As far
> > > as I can tell:
> > >
> > > - I believe this doesn't support a Cilk-like (or TBB-like?)
> > > implementation.  I believe Cilk effectively allows colder
> > call frames
> > > to be stolen by another thread, for example.
> > [...]
> >
> > A primary reason I've been advocating an "async()" facility,
> > besides as a way to make async calls easy, was specifically
> > to make sure that an async() with a default "could execute on
> > this or another thread" policy could be implemented to run on
> > a work stealing implementation, such as Cilk, TBB, or our own
> > TPL/PPL/ConcRT. I haven't seen anything that would prevent
> > that -- all you need to say is exactly "could execute on this
> > or another thread" and IIUC that's all we need.
> I'm not an expert on this stuff.  (Copying Pablo in case he can help
> and hasn't seen this.) But my understanding is that when Cilk is about
> to execute a spawned task, it pushes a work item on the task queue that
> represents the parents' continuation.  That pushed item may get stolen
> by another worker thread.  Thus the parent thread may appear to have
> migrated between threads.  My vague intuition is that this is necessary
> if you want to both:
> 
> - Efficiently execute child tasks on the parent thread in the common
> case, and
> - Be reasonably sure that when you steal a task, you're likely
> something reasonably large, from closer to the root of the tree.
> 
> If I understand this correctly, this doesn't mesh particularly well
> with a bunch of other thread facilities in the current draft, e.g.
> thread-local variables.  (There are more details on Cilk in their PLDI
> 98 paper http://supertech.csail.mit.edu/papers/cilk5.pdf .)
> 
> TBBs tasks are more complicated to use, since they don't rely on
> language support.  They may avoid this kind of issue; I'm not sure.
> (Copying Arch in case he can help.)
> 
> My impression is that for both Cilk and TBB, interactions with other
> C++0x thread facilities haven't really been worked out.  (Not to
> surprising, since we're having enough problems with interactions
> between C++0x facilities, not to mention weird interactions within
> older systems we're building on.)
> 
> Hans
> 
> 
> --
> cpp-threads mailing list
> cpp-threads at decadentplace.org.uk
> http://www.decadentplace.org.uk/cgi-bin/mailman/listinfo/cpp-threads




More information about the cpp-threads mailing list