[cpp-threads] Asynchronous Function Proposal

Doug Lea dl at cs.oswego.edu
Wed Jun 3 13:46:12 BST 2009


Andrei Alexandrescu wrote:
> Robison, Arch wrote:
>> Instead of parallel_sum, the usual parallel naive Fibonacci
>> recurrence would be better than parallel_sum.  On typical machines,
>> parallel_sum runs out of bandwidth.  Native parallel Fibonacci makes
>> a good benchmark for recursive tasking because it is not sensitive to
>> machine bandwidth and generates unbalanced trees of work.

Fibonacci has a problem as a benchmark because
optimizing compilers are allowed to discover the
recurrences that lead to O(n) (not O(2^n)) solution.
I've seen some compilers do the beginnings of this.
Just unfolding one layer linearizes fib(2) calls.
Which makes results of parallel versions hard to interpret.

When trying to do some cross-language/compiler/framework
comparisons once, I threw together a stripped down
divide-and-conquer numerical integration test.
I just put some versions at
   http://gee.cs.oswego.edu/dl/code/integrateTests/
Numerical integration has the same overall virtues
of fibonacci -- no big data arrays, and quick (although
not as quick) per-call computations, but is less amenable
to compiler optimizations that make results uninterpretable
when comparing across languages, compilers, and frameworks.
(The integrate.cpp version includes an optional comparison of
partial accumulation loop vs pure recursion to help determine
if compiler-based unfolding is making a big contribution.
It seems not to using gcc or suncc.)

It would be better yet to use a function without a
known analytic integral, but doing this adds enough
per-task overhead to cloud results, so this one just
integrates a cubic function.

I didn't include a c++0x parallel version. Or TBB.

The Java version needs jsr166y.jar (containing
upcoming ForkJoin stuff) available at
http://gee.cs.oswego.edu/dl/concurrency-interest/index.html
http://gee.cs.oswego.edu/dl/jsr166/dist/jsr166y.jar
BTW, As is almost always the case, the version (run with "d")
that uses demand-sensing to decide whether to fork vs recurse
is generally fastest.

-Doug



More information about the cpp-threads mailing list