[cpp-threads] C++ committee meeting in Mont Tremblant

Fri Oct 21 16:53:01 BST 2005

Sorry for the length of this posting.

   Date: Fri, 14 Oct 2005 20:57:32 -0700 (PDT)
   From: Hans Boehm <Hans.Boehm at hp.com>
   Reply-To: C++ threads standardisation <cpp-threads at decadentplace.org.uk>
   Sender: cpp-threads-bounces at decadentplace.org.uk

   On Fri, 14 Oct 2005, Peter A. Buhr wrote:
   > Otherwise, we have to come up with another mechanism to explain when the
   > thread runtime starts and what programmers can assume about the execution
   > model. I'm totally opposed to solutions adopting artificial exceptions to
   > the standard allocation/deallocation rules, like threads cannot be
   > declared in the global scope or threads can only be declared on the
   > heap. These kinds of artificial rules are not language design or
   > extension, it is just hacking, and everyone will spot it as such.

   Isn't this covered by the usual (perhaps implicit?) rule that the standard
   library must be usable by the time any user code runs, even in constructors?
   We assume that user-defined constructors for static objects must be able to
   call ::new to allocate memory, which must thus be (magically?) initialized
   first.  Are threads really any different?  Don't you run into problems
   primarily because thread primitives are not in the standard library?

   I realize that there are problems with initialization order in C++, and I've
   been bitten by them.  But I don't immediately see why threads make this any
   worse.

Threads don't make this problem worse, but the problem itself makes it
difficult for a thread library and other library developers, in general.  I'll
give a brief sketch of the uC++ startup, which illustrates some general issues.

A thread library usually needs to replace "malloc" and "free" (on which "new"
and "delete" depend), not only for locking reasons but more often to get more
concurrency than a basic locking heap (e.g., Hoard allocator).  However, calls
to malloc and free start very early in a program's boot-sequence, so malloc and
free may be acquiring and releasing locks long before the runtime thread-system
has started.  Often this is not a problem because the system is sequential
during startup so the locks are always open, but it could be an issue.

Now initializing the heap requires calls to sbrk/brk to push the heap boundary.
These routines are thread-safe and usually call pthread locks, which uC++
covers with a simulation. But, the uC++ pthread simulation now has to do some
delicate initializations so it can process these calls without causing
problems. With gcc, there is also the gthreads library, used for things like
locking during exception handling; fortunately, gthreads can be make to call
pthreads routines. However, gthreads uses thread-local storage (which I
absolutely hate with a passion), and that potentially requires storage
allocation, meaning you can run into a recursive startup problem with
initialization of the heap or other parts of the thread-library. So again, you
have to be very careful and put in some additional checks for this kind of
event.

Eventually, the C++ startup will call the constructor for the "boot" object of
the thread-library, which has been inserted when users include an appropriate
header file. But there is one of these "boot" objects per translation unit, so
the boot object has to ignore all but the first call to start the thread
library, which is usually accomplished with a static counter. Now the
thread-runtime can start booting itself, which usually means some very tricky
code to get the initial UNIX thread to look like a thread-library threads and
allow for context-switching with associated thread scheduling. You also have to
prepare for interaction with the underlying kernel threads (sproc, lwp, clone,
pthreads) to obtain access to multiprocessors. This step can involve complex
interaction with the magic "thread" register on many new architectures,
especially the Itanium.  When pthreads are also used as kernel threads (1:1
thread model mandated by the OS), then there are even more complex
interactions, that require using linker tricks. Suffice it to say, that it is
very complex and I've left out some of the more detailed stuff. And I should
mention that shutting the system down is almost as hard as starting it up,
especially if you want to do some reasonable error checking.

So it takes a lot of complex and intricate code, which varies from compiler to
compiler and operating system to operating system, for the following uC++
program:

  #include <uC++.h>
  #include <iostream>
  using namespace std;

  _Task T {
      int i;
      void main() { cout << i << endl; } // thread starts here
    public:
      T( int i ) : i(i) {}
  };

  T t(0); // global variable

  struct O {
      static T t; // static variable, which is global
      O() {
          T t(2); // local variable created from global context
          delete new T(3); // dynamic variable created from global context
      }
  };

  T O::t(1);  // static initialization
  O o;  // global variable

  void uMain::main() { // program starts here
      cout << "main" << endl;
  }

to print out:

  @plg2[1]% a.out
  0
  1
  2
  3
  main

Now you can invoke magic by saying the standard library has to be started
before any user code begins, and make the thread-library part of the standard
library. However, this does not address how startup occurs *within* the
standard library; it simply puts the "witchcraft" boundary between the standard
library and the user code. But what if a company wants to develop and sell
standard library replacement code? One of the great features of C/C++ is that
you can replace a lot of the standard libraries to build interesting extensions
(like garbage collection in C/C++ ;-) How do the programmers in this domain
ensure that initialization occurs in some reasonable ordering?

I fully understand that booting an OS, an application, a thread-library, may
require some witchcraft. However, a language should try to minimize the need
for witchcraft or push the witchcraft boundary back as far as possible. I have
not studied all the ramifications of this issue, but on the surface other
languages seem to tackle it by creating a module mechanism. Is a module
mechanism needed for C++? I don't know. But I do know that getting uC++ started
is very difficult; it seems significantly more difficult than it needs to be
and the current code is very fragile. I want help from the language to make it
simpler. I'm sure other library developers feel the same way.