[cpp-threads] Update on N2889/N2880/N2901

Sun Jun 21 23:40:35 BST 2009

From: "Herb Sutter" <hsutter at microsoft.com>
> Update: Lawrence and I spoke on the phone a couple of times on
> Friday, and as of this moment we still have two async proposals,
> but we will work further this afternoon to see if we can yet
> converge our proposals into a unified async proposal. (Thanks
> again for your continued efforts, Lawrence!)
>
> However, our chats on Friday have already illuminated a clearer(?)
> understanding of the N2880 issues that prompted us to agree that
> I write a paper that tries to more clearly lay out the fundamental
> issues involved.
>
> Here is a draft of that additional paper, primarily for Lawrence's
> review but also for anyone else who's working on Sunday. Thanks,

Thanks Herb.  It became clear to me during our conversation, that
Hans and I hadn't quite captured the 50,000 foot level of the issue,
and having someone with different perspectives write that up will
go a long way to removing implicit assumptions.

On 6/21/09, Peter Dimov <pdimov at mmltd.net> wrote:
> Your thread_local analysis is a bit off (politely speaking).
>
> - Thread locals are used in code that does not have control over
> the threads from which it's called. You can't just replace a
> thread_local with a stack-local in the thread because you have
> no control over the stack of the thread. If you could, there'd
> obviously be no need for thread locals. But there is, no matter
> how much you try to theorize them out of existence.

I agree with this statement.  Many of the uses I have for
thread-local variables come about in libraries where I simply do
not have the stack frame.

Note Doug Lea's comments on the use of thread locals in Java.

> - Thread locals need destructors in C and C++ (and not in Java)
> because there's no GC. You can't just allocate per-thread state
> and keep it in a raw pointer, because you'll leak it on every
> thread completion. You need (at minimum) (the equivalent of)
> thread_local auto_ptr or shared_ptr.

Note that share_ptr relies on a destructor.  It is, however,
a limited case of destructor.

> - Thread locals are not of limited value when there is thread
> reuse; on the contrary, they are very useful in this case. Thread
> locals are often used as a performance optimization to cache
> per-thread data and avoid synchronization. Without thread reuse,
> every task will hit the global state, incurring synchronization
> penalties. With thread reuse, tasks can be served from the local,
> per-thread cache.
>
> The straightforward and well-known example that plainly illustrates
> the above three points is malloc/free with thread-local free lists.

> Regarding lifetime issues with thread locals: POSIX thread locals
> are basically Phoenix singletons. Under a typical use pattern,
> they would be reconstructed on first use and this will happen even
> after destruction (no UB). If this occurs, they will be destroyed
> again. There is an an implementation-defined number of destruction
> cycles, after which the implementation gives up. (Arguably, the
> C-based POSIX API is better at static construction/destruction
> in the MT case than C++ is in the single-threaded case.)

Note also Doug Lea's comments on "cleaning up" thread locals in
Java Executors.

But backto comments.

| 1. The trouble with detached threads and static destruction

The current resolution in the working paper is: "Require that
a program not access a global after it has been destructed."
In practice, this constraint is slightly weaker thatn proposal 1C
because it permits thread pools in global variables.

| 2. The trouble with thread_local destruction and static destruction
|
| Incidentally, note also that thread_local objects with nontrivial
| destructors are a novelty. As mentioned in N2880, they are "new
| in C++0x, and not widely implemented."

The problem is that C++ is one of the few languages with destructors,
and we are working out the implications of that model of variables.
One of those consequences, is that new storage durations will induce
new destructor executions.

| . Ordering across translation units.

In practice, one should use only library facilities or objects
defined previously in the same translation unit.  I do not think
that we have made the situation substantially worse here.

| . Function local statics.

I think this is a strong case for why function-local statics must
be part of the interface of a function, so that programmers know
to either avoid it in destructors, or to ensure that it was also
used in the constructors.  (There are probably more qualifications
needed here.)

| . Standard library requires magic.

I am hoping that future module and dynamic library work will address
this problem.

| or this to give it global visibility:

I don't see how this example solves any problem.  The destructor for
x_owner will still execute after any synchronization in thread_main,
and so all the issues resurface.  Furthermore, it introduces
a potential premature destruction if any local variable with a
destructor is defined before x_owner.

| or even something more general (modulo typos):

The inefficiency and clumsiness in this example (as experienced
with pthread_get_specific) was the motivation for thread-local
variables in the first place.  Sometimes it needs to be fast, and
the compiler/linker can help dramatically.

| . 2A: thread_local objects must have trivial destructors.

I will disagree with this conclusion.  If we have thread_local
variables at all, they will accrete resources, and the only tool
we have to reliably release those resources is destructors.  As
Doug Lea points out, Java programmers are asking for mechanisms
to release the resources of thread-local variables.

| 3. The trouble with thread_local data and reusable threads

| No matter what we do now or in the future, with or without thread
| pools in the standard, thread_local variables are inherently
| of limited value on any system that reuses threads (e.g.,
| for efficiency), including but not limited to thread pools. As
| illustrated in N2880, this is because cleanup is problematic
| (addressed in �2 above) and programmers don't control the lifetime
| of the threads or which thread a task will execute on.

I do not in fact think this is true.  If the programmer controls the
lifetime of the pool object, then the programmer controls the lifetime
of the threads controlled by that object.

| and telling programmers not to use them there.

That advice is easier to say than to follow in large-systems
development.  Programmers will want to introduce concurrency in
distributed programs so as to enable a single processor to attend to
many connections.  In such an environment, caching in thread-local
variables is desired to limit inter-machine bandwidth.

And to summarize what I think Hans and I failed to say:

    Threads have variables in the form of thread-locals, parameters,
    and automatic variables.  To ensure that the resources held by
    those variables are released, one must join with the thread so
    that those variables are destroyed.  To ensure that destructors
    of those variables are well-defined, one must join with the
    thread before its referenced environment is destroyed.

    Some consequences of this observation are:
        One should never detach non-trivial threads.
            (There is probably a formal definition in here.)
        All thread pools should be explicitly declared.
            (I.e. implicit thread pools are bad.)
        One should manage the thread pool as one would manage
            the resources it will accrete and reference.

Isn't this just the coolest job ever?

-- 
Lawrence Crowl