[cpp-threads] Update on N2889/N2880/N2901

Mon Jun 22 18:47:59 BST 2009

Thanks Peter!

Peter wrote:
> Your thread_local analysis is a bit off (politely speaking).

Thanks. I can take the blunt version too. :-) (BTW, by this I assume you agree with what I say in the detached threads part?)

> - Thread locals are used in code that does not have control over the threads
> from which it's called. You can't just replace a thread_local with a
> stack-local in the thread because you have no control over the stack of the
> thread. If you could, there'd obviously be no need for thread locals. But
> there is, no matter how much you try to theorize them out of existence.

Those alternatives definitely aren't full replacements. I've added an explicit caveat about that. In fact maybe I should remove those workarounds if they're just a useless distraction.

> - Thread locals need destructors in C and C++ (and not in Java) because
> there's no GC.

Interjection: This may have merit, but I do want to push back on this in principle. C++ destructors are much more powerful than Java finalizers. Destructors correspond directly to the Java dispose idiom, not to finalizers. Finalizers are different, and the code you can safely write inside finalizers is much more restrictive (e.g., in which other objects are still valid to access) than what you can write in a destructor. Finally, the popular form of GC in C++ is reference counting, which brings this interjection back to your paragraph:

>                You can't just allocate per-thread state and keep it in a raw
> pointer, because you'll leak it on every thread completion. You need (at
> minimum) (the equivalent of) thread_local auto_ptr or shared_ptr.

Actually, this links to something I wondered about: pthreads relies on a hook that can get called with a thread ends. If we had something similar in std::thread, couldn't a library hook into that and accomplish the same thing?

If not, fallback position: I wonder if it would be enough to say something like "thread_locals can't have nontrivial destructors, except that thread_local std::*_ptr is okay"?

I'm not disagreeing that would be convenient, but there really are problems with allowing just anything to be thread_local, aren't there?

> - Thread locals are not of limited value when there is thread reuse; on the
> contrary, they are very useful in this case. Thread locals are often used as
> a performance optimization to cache per-thread data and avoid
> synchronization. Without thread reuse, every task will hit the global state,
> incurring synchronization penalties. With thread reuse, tasks can be served
> from the local, per-thread cache.

Maybe it's an audience issue. Experienced experts do those things. But on Windows and .NET the guidance to app and library developers is to avoid TLS like the plague in thread pool tasks. The only safe pattern is when a single coordinator has forked work, joins with it, and then initiates the cleanup after all work has terminated. Both our new TPL (.NET) and PPL (native C++) provide primitives for this, primarily for reductions.  For example, TPL provides overloads of For/ForEach:

  For(
    0, N,
    () => /* initialize & return a new TLS object; called once per distinct thread */,
    (i, ps) => /* the loop body; the TLS state is accessible via 'ps' */,
    ps => /* destroy the TLS state; called once per distinct thread */
  );

> The straightforward and well-known example that plainly illustrates the
> above three points is malloc/free with thread-local free lists.

I obviously haven't written one. :-) So this absolutely requires thread_local objects with nontrivial destructors?

> Regarding lifetime issues with thread locals: POSIX thread locals are
> basically Phoenix singletons. Under a typical use pattern, they would be
> reconstructed on first use and this will happen even after destruction (no
> UB). If this occurs, they will be destroyed again. There is an an
> implementation-defined number of destruction cycles, after which the
> implementation gives up. (Arguably, the C-based POSIX API is better at
> static construction/destruction in the MT case than C++ is in the
> single-threaded case.)