[cpp-threads] Asynchronous Function Proposal
Herb Sutter
hsutter at microsoft.com
Wed Jun 3 15:47:45 BST 2009
P.S.: Just to add info for completeness, our PPL product now in beta also does "steal child", and you can program "steal parent" via a separate agents library.
> -----Original Message-----
> From: cpp-threads-bounces at decadentplace.org.uk [mailto:cpp-threads-
> bounces at decadentplace.org.uk] On Behalf Of Robison, Arch
> Sent: Tuesday, June 02, 2009 10:22 PM
> To: Boehm, Hans; C++ threads standardisation; Lawrence Crowl; Bjarne
> Stroustrup
> Cc: Pablo Halpern
> Subject: Re: [cpp-threads] Asynchronous Function Proposal
>
> Instead of parallel_sum, the usual parallel naive Fibonacci recurrence
> would be better than parallel_sum. On typical machines, parallel_sum
> runs out of bandwidth. Native parallel Fibonacci makes a good
> benchmark for recursive tasking because it is not sensitive to machine
> bandwidth and generates unbalanced trees of work.
>
> - Arch
>
> -----Original Message-----
> From: Robison, Arch
> Sent: Tuesday, June 02, 2009 8:09 PM
> To: 'Boehm, Hans'; C++ threads standardisation; Lawrence Crowl; Bjarne
> Stroustrup
> Cc: Pablo Halpern; Doug Lea
> Subject: RE: Asynchronous Function Proposal
>
> You have to pick your poison. Cilk and TBB make different choices on
> that. I call them "steal parent" versus "steal child".
>
> In Cilk, spawning a child task causes the current thread to immediate
> execute the child, and the parent is put in a suspended state where it
> can be stolen by an idle thread.
>
> In TBB, spawning a child task puts the child in a suspended state where
> it can be stolen by another thread, and the parent continues execution.
> TBB can simulate Cilk "steal parent if the programmer writes in
> explicit continuation-passing style, which is not easy.
>
> The big wins from "steal parent" in Cilk are:
> * Threads never wait for completion of tasks, because the thread that
> finishes the last child picks up execution of the joining parent.
> * Total space is (number of threads)*(space for sequential execution)
> The big drawback of Cilk is that is requires a non-standard calling
> convention that has cactus stacks and decouples stacks from threads.
> As Hans points out, "steal parent" causes weird things to happen with
> existing implementations of thread-local storage.
>
> TBB's "steal child" avoids the drawbacks. Thread local storage works
> and standard calling conventions can be employed. But this comes at
> the expense of not having the Cilk guarantees on space and incurring
> waiting. TBB uses heuristics to control space demands, and the waiting
> sometimes artificially constraints parallelism. These heuristics
> usually work, but like any heuristics, they occasionally bite us. TBB
> also sometimes launches tasks on threads that are executing a waiting
> task, which helps with efficiency in common cases, but has its own
> risks. There's no free lunch here.
>
> I suggest that for anyone prototyping the proposed asynchronous call
> functionality, that they try the parallel_sum example over 10^9 values.
> That yields a recursive tree of about 10^6 nodes, with a height of 20.
> With Cilk, the space demands will be on the order of about P*(20
> nodes). In TBB without the heuristics, it could go quadratic in the
> tree depth, e.g. about 400 nodes. The TBB heuristics keep it down to
> P*(20 nodes), but by sometimes limiting parallelism. With any
> prototype of the proposal, I would want to know how much space does it
> consume, and how well does it scale?
>
> It's unfortunate that we are stuck with calling conventions that arose
> in a sequential milieu.
>
> - Arch
>
> -----Original Message-----
> From: Boehm, Hans [mailto:hans.boehm at hp.com]
> Sent: Tuesday, June 02, 2009 7:08 PM
> To: C++ threads standardisation; Lawrence Crowl; Bjarne Stroustrup
> Cc: Pablo Halpern; Robison, Arch; Doug Lea
> Subject: RE: Asynchronous Function Proposal
>
> [Adding Pablo, Arch, and Doug: This is a continuing discussion of
> Lawrence Crowl's proposal for a simple C++ facility to support
> asynchronous function call execution. Reattaching Lawrence's proposal
> for newcomers.]
>
> > -----Original Message-----
> > From: Herb Sutter
> >
> > > But I'm still worried about whether we have enough of the details
> > > right here that this facility will actually be usable. I
> > don't have
> > > the necessary experience that I feel I can evaluate a
> > design. As far
> > > as I can tell:
> > >
> > > - I believe this doesn't support a Cilk-like (or TBB-like?)
> > > implementation. I believe Cilk effectively allows colder
> > call frames
> > > to be stolen by another thread, for example.
> > [...]
> >
> > A primary reason I've been advocating an "async()" facility,
> > besides as a way to make async calls easy, was specifically
> > to make sure that an async() with a default "could execute on
> > this or another thread" policy could be implemented to run on
> > a work stealing implementation, such as Cilk, TBB, or our own
> > TPL/PPL/ConcRT. I haven't seen anything that would prevent
> > that -- all you need to say is exactly "could execute on this
> > or another thread" and IIUC that's all we need.
> I'm not an expert on this stuff. (Copying Pablo in case he can help
> and hasn't seen this.) But my understanding is that when Cilk is about
> to execute a spawned task, it pushes a work item on the task queue that
> represents the parents' continuation. That pushed item may get stolen
> by another worker thread. Thus the parent thread may appear to have
> migrated between threads. My vague intuition is that this is necessary
> if you want to both:
>
> - Efficiently execute child tasks on the parent thread in the common
> case, and
> - Be reasonably sure that when you steal a task, you're likely
> something reasonably large, from closer to the root of the tree.
>
> If I understand this correctly, this doesn't mesh particularly well
> with a bunch of other thread facilities in the current draft, e.g.
> thread-local variables. (There are more details on Cilk in their PLDI
> 98 paper http://supertech.csail.mit.edu/papers/cilk5.pdf .)
>
> TBBs tasks are more complicated to use, since they don't rely on
> language support. They may avoid this kind of issue; I'm not sure.
> (Copying Arch in case he can help.)
>
> My impression is that for both Cilk and TBB, interactions with other
> C++0x thread facilities haven't really been worked out. (Not to
> surprising, since we're having enough problems with interactions
> between C++0x facilities, not to mention weird interactions within
> older systems we're building on.)
>
> Hans
>
>
> --
> cpp-threads mailing list
> cpp-threads at decadentplace.org.uk
> http://www.decadentplace.org.uk/cgi-bin/mailman/listinfo/cpp-threads
More information about the cpp-threads
mailing list