[cpp-threads] Floating-point state

Sat Aug 19 09:18:58 BST 2006

"Boehm, Hans" <hans.boehm at hp.com> wrote:
> 
> I'm also not sure how this precludes implicit parallelization.
> Presumably we won't parallelize anything that contains explicit fp
> status operations, or we'll treat it very gingerly.  If we parallelize a
> loop that doesn't explicitly refer to the fp status, is there a problem
> with just propagating the fp context into all threads and merging
> accumulated exceptions at the end?

No - that would work, just as for thread-local storage, but there is
a problem (see below).

> Note that we're also discussion other constructs, notably thread local
> storage, that probably don't interact well with any form of implicit
> threading.  In my view, that's OK.  Automatic parallelization only works
> well in limited domains anyway.

Well, yes, but ....

The killer with IEEE 754's model is that the flags are necessarily
extremely fine-grained, and MUST be tested and reset nearly as
frequently as floating-point operations if they are to be returned
reliably.  Consider (a) complex multiplication and (b) the use for
emulating decimal fixed-point.

In the former, even multiplication needs to do that if it is not to
produce spurious overflows (and division is worse).  The alternative
technique is very slow.

In the latter, inexact needs to be translated to overflow for
addition and subtraction, but NOT for multiplication etc.

Note that these are inherent in the MODEL, so could not be avoided
even if C++ did not copy C99's mistakes.  Many of us have been trying
get these fixed (and I do mean fixed) in arithmetic models since the
1970s, and it is one of the reasons that IEEE 754 did not take off.

In turn, this constrains reordering optimisations very considerably
and makes the exact timing of initialisation and termination visible
where they weren't before.  And we had enough of a discussion on these
not to want to complicate them :-(

Lawrence may be right about what Bjarne was looking for - while I
favour implicit parallelisation, I do accept that it is very hard
to fit into POSIX's thread=process model and C/C++'s highly aliased
one.  But that initialisation/termination issue is a Big Problem, and
the optimisation one is not a small one.

Note that EXACTLY the same points arise for POSIX-style signal handling,
but they can be swept under the carpet because such facilities are not
(in practice) used with a very fine grain.  That is the main reason
that POSIX's total lack of specification of how signals interact with
threading usually doesn't cause the chaos that its specification implies
that it should.

Unfortunately, that can't be assumed for IEEE 754 flags - either they
have to be effectively unsupported or the problems have to be faced
up to.

Regards,
Nick Maclaren,
University of Cambridge Computing Service,
New Museums Site, Pembroke Street, Cambridge CB2 3QH, England.
Email:  nmm1 at cam.ac.uk
Tel.:  +44 1223 334761    Fax:  +44 1223 334679