[cpp-threads] Re: A hopefully clearer document on POSIX threads

Sun Feb 19 18:25:34 GMT 2006

The base document will appear in the mailing, but is temporarily
on http://www.hpcf.cam.ac.uk/export/POSIX.[tex,pdf,ps].

The attachments are the replies (with most headers stripped).

Regards,
Nick Maclaren,
University of Cambridge Computing Service,
New Museums Site, Pembroke Street, Cambridge CB2 3QH, England.
Email:  nmm1 at cam.ac.uk
Tel.:  +44 1223 334761    Fax:  +44 1223 334679
-------------- next part --------------
From: "Boehm, Hans" <hans.boehm at hp.com>
To: "Nick Maclaren" <nmm1 at cus.cam.ac.uk>,
	<cxxpanel at yahoogroups.co.uk>
Cc: <Lawrence.Crowl at Sun.com>

Nick -

I would propose that we try to get to the core of the issues here, and
avoid looking at detailed issues where we think Posix made avoidable
mistakes.

I'll try to state my understanding of the core issue here, at the risk
of getting it wrong again.

I think we agree that both MPI and shared memory concurrency are here to
stay.

The main question then seems to be whether threads used for shared
memory concurrency should be hierarchically structured, i.e. whether a
thread should be created in the context of a particular function
invocation, and should be required to terminate before the function
invocation does.  OpenMP mostly uses this model, as you point out,
though it seems to be somewhat constrained by Posix.  Posix threads do
not.  Neither do Java or Boost threads.  I think neither do any of the
proposals that try to associate threads with objects.

Let's refer to these as the hierarchical and flat models.  The
differences seem to be:

1) The flat model requires explicit joins, when you don't want threads
to outlive a function invocation.

2) In the flat model, there is no way to start a thread that outlives a
function invocation (?)  I think this is a showstopper for certain kinds
of libraries, though it's no doubt OK, and even convenient, for most
kinds of HPC programming.

3) Depending on the details, it may be possible to directly access local
variables from enclosing scopes in the hierarchical model.  This is
clearly more convenient when it makes sense.

4) The hierarchical model is much more convenient for exception
handling, when it makes sense, since exceptions can propagate to
enclosing threads.

It seems to me that there are good arguments for both.

The existing approach seems to be to provide a low-level flat model
(e.g. Posix), and then build the hierarchical version on top (e.g.
OpenMP).  There are strong arguments for implementing one in terms of
the other, and for putting the flat model on the bottom.  (It doesn't
work the other way, I think.)

We can discuss the performance issues in detail, but I'm not sure any of
them are inherent problems.  (The most serious pthread performance
problems I've seen seemed to be caused by the fact that it was unclear
whether the pthread library wanted to provide real-time guarantees, an
orthogonal issue.)

This doesn't answer the question of whether the flat model, the
hierarchical model, or both should be standardized as part of the
language.  But I think you need both somewhere.

Hans
-------------- next part --------------
To: "Boehm, Hans" <hans.boehm at hp.com>
Cc: Nick Maclaren <nmm1 at cus.cam.ac.uk>, cxxpanel at yahoogroups.co.uk
Subject: Re: A hopefully clearer document on POSIX threads 
From: Lawrence.Crowl at Sun.com

"Boehm, Hans" <hans.boehm at hp.com> writes:
 >1) The flat model requires explicit joins, when you don't want threads to
 >outlive a function invocation.
 >
 >2) In the flat model, there is no way to start a thread that outlives a
 >function invocation (?)  I think this is a showstopper for certain kinds of
 >libraries, though it's no doubt OK, and even convenient, for most kinds of
 >HPC programming.

Did you mean "In the hierarchical model"?

 >3) Depending on the details, it may be possible to directly access local
 >variables from enclosing scopes in the hierarchical model.  This is clearly
 >more convenient when it makes sense.

The classic cactus stack.

 >4) The hierarchical model is much more convenient for exception handling,
 >when it makes sense, since exceptions can propagate to enclosing threads.

The equivalent in the flat model is propogating exceptions at the join.

 >The existing approach seems to be to provide a low-level flat model (e.g.
 >Posix), and then build the hierarchical version on top (e.g. OpenMP).  There
 >are strong arguments for implementing one in terms of the other, and for
 >putting the flat model on the bottom.  (It doesn't work the other way, I
 >think.)

I agree.

 >This doesn't answer the question of whether the flat model, the hierarchical
 >model, or both should be standardized as part of the language.  But I think
 >you need both somewhere.

I think the language must standardize the flat model, if for no other
reason than it is the more powerful model (though sometimes less
convenient).

In terms of standardization, I believe that the OpenMP standard has
adequately addressed the hierarchical model and that the C++ committee
should not duplicate that work.  Two standards might be worse than one.

  Lawrence Crowl             650-786-6146   Sun Microsystems, Inc.
                   Lawrence.Crowl at Sun.com   16 Network Circle, UMPK16-303
           http://www.Crowl.org/Lawrence/   Menlo Park, California, 94025
-------------- next part --------------
To: cxxpanel at yahoogroups.co.uk, Lawrence.Crowl at Sun.com,
    "Boehm, Hans" <hans.boehm at hp.com>
From: Nick Maclaren <nmm1 at cus.cam.ac.uk>
Subject: [cxxpanel] RE: A hopefully clearer document on POSIX threads

I have merged both messages to keep things together.  Generally,
I think that we are in agreement about the issues - but less so
about the resolution!

> I would propose that we try to get to the core of the issues here, and
> avoid looking at detailed issues where we think Posix made avoidable
> mistakes.

Right.  Let's stick to the models.

> I think we agree that both MPI and shared memory concurrency are here to
> stay.

Yes, indeed.  How many of the other models will make a comeback, and
what new ones will appear, needs the ability to predict the future.
But those two are here, now.

>2) In the flat model, there is no way to start a thread that outlives a
>  >function invocation (?)  I think this is a showstopper for certain kinds of
>  >libraries, though it's no doubt OK, and even convenient, for most kinds of
>  >HPC programming.
> 
> Did you mean "In the hierarchical model"?

I think so, too.  Also, it is suitable only for SOME kinds of HPC - but,
as I said, MPI dominates in HPC (at least for now).

>  >The existing approach seems to be to provide a low-level flat model (e.g.
>  >Posix), and then build the hierarchical version on top (e.g. OpenMP).  There
>  >are strong arguments for implementing one in terms of the other, and for
>  >putting the flat model on the bottom.  (It doesn't work the other way, I
>  >think.)
> 
> I agree.

Not at all.  The other way is MUCH easier!  All you have to do is to
spawn N threads on program entry, and then use the flat model :-)
Note that is what most tuned implementations actually do - when
a function calls pthread_create, all it does is to activate one
of the "thread slots" that were set up initially.

As a diversion, one of the ancient models that I like is the "sea
of threads" one, where a program is broken up into basic blocks
with dependencies, and each one is fired off as a thread.  That
is related to dataflow, of course, and was used on the Tera MTA.
When I mentioned this in a POSIX context, most of the Friends of
POSIX had hysterics :-)

They were right, of course.  You absolutely cannot support 10^6
threads per process if threads are as heavyweight as POSIX ones.
But that is an example of a realistic threading model which can't
be mapped to a POSIX-style flat model.  It can be done by a level
of indirection, where there are 10^6 virtual threads, and the
program schedules them onto the physical POSIX-style threads.

I could ask the question of whether C++ should contemplate that
model, but I could also answer it :-)  NO!!!  Not starting from here.

>  >This doesn't answer the question of whether the flat model, the hierarchical
>  >model, or both should be standardized as part of the language.  But I think
>  >you need both somewhere.
> 
> I think the language must standardize the flat model, if for no other
> reason than it is the more powerful model (though sometimes less
> convenient).

No, not at all - and not just for the reason I mention above.  The
POSIX-style flat model brings in major consistency issues which need
resolving.  Here are a few:

    The seniority and termination problem.  Let's ignore that, as I
described it in my document.  It could be closed, but not compatibly
with POSIX.

    The scoping problem.  Threads are started within a scope, and then
are permitted to escape that scope.  This could be closed by forbidding
threads to use anything with less than global scope.

    The thread/process isolation problem.  Are threads allowed to do
things that override what other threads are trying to do, and how
are conflicts resolved?  Think about one thread changing the global
state or suspending a thread in the middle of stack unwinding.

    The master thread problem.  There are both specification and
implementation advantages in having a master thread, and the latter
may need to invent one if the specification doesn't work that way.

Yes, those are all soluble, but the flat model brings in a LOT of
complicated specification problems that the hierarchical one doesn't.
The hierarchical one brings in a few that the flat one doesn't, of
course.

> In terms of standardization, I believe that the OpenMP standard has
> adequately addressed the hierarchical model and that the C++ committee
> should not duplicate that work.  Two standards might be worse than one.

Hmm.  Three comments here, two minor and one major:

    I disagree that it is adequate, but I agree that it is a feasible
starting point.  The critical omission is the memory model, of course,
but there are other defects.  A also agree that specifying something
similar but different would be both a waste of effort and quite
possibly no better.

    Its approach to C++ exceptions is badly flawed, and a message to
that effect would be worthwhile.

    The major one is that standardising on a POSIX-style model could
well be incompatible with OpenMP, and I think that is a Bad Idea.  My
view is that C++ should at least PERMIT an improved OpenMP as a
semi-supported extension.  I would be really unhappy if the move to
POSIX prevented that, or even if it prevented fixing that exception
flaw.

> We can discuss the performance issues in detail, but I'm not sure any of
> them are inherent problems.  (The most serious pthread performance
> problems I've seen seemed to be caused by the fact that it was unclear
> whether the pthread library wanted to provide real-time guarantees, an
> orthogonal issue.)

I am afraid that I disagree.  The problems don't really show up at all
on tiny systems (up to 4 cores), but build up explosively thereafter.
And, yes, I do mean super-exponentially!

Now, 90% of the problems are inherent in the design of the operating
system, and are outside the interface, but there are some major ones
caused by the interface (especially POSIX-like ones).  In particular,
as soon as you give threads the ability to behave like full processes,
the operating system gets little choice but to schedule them as full
processes.  Which is Bad News for scalability on close-coupled programs
if the programmer/language assumes a different scheduling model from
the operating system.

> This doesn't answer the question of whether the flat model, the
> hierarchical model, or both should be standardized as part of the
> language.  But I think you need both somewhere.

I agree with that.

Regards,
Nick Maclaren,
University of Cambridge Computing Service,
New Museums Site, Pembroke Street, Cambridge CB2 3QH, England.
Email:  nmm1 at cam.ac.uk
Tel.:  +44 1223 334761    Fax:  +44 1223 334679

-------------- next part --------------
From: Hans Boehm <hans.boehm at hp.com>
Subject: Re: [cxxpanel] RE: A hopefully clearer document on POSIX threads

On Sat, 18 Feb 2006, Valentin Samko wrote:

> BH> We can discuss the performance issues in detail, but I'm not sure
any of
> BH> them are inherent problems.  (The most serious pthread performance
> BH> problems I've seen seemed to be caused by the fact that it was unclear
> BH> whether the pthread library wanted to provide real-time guarantees, an
> BH> orthogonal issue.)
>
> What happens in terms of performance on a massively multi cpu/core box when
> a single thread calls a memory barrier? Does this impact all the other
> CPU/cores? Will we need a new type of mutexes, memory barriers, ... which
> can be used to only invoke a barrier on a few selected CPU/cores?
>
Assuming you mean what's also called a "memory fence", i.e. an
instruction that enforces certain kinds of inter-thread memory ordering,
then that's generally implemented locally, I believe.  I don't think
it gets dramatically more expensive in large machines.

If you are talking about barriers that require all threads to reach the
barrier before any can proceed, then I think there are standard techniques
that require time logarithmic in the number of processors.  Those may
get a bit slower, though other techniques may apply with
multiple processors on a chip.

Hans