[cpp-threads] RE: C++ Connections proposal

Mon Apr 25 17:19:37 BST 2005

   Hans> I actually don't think that the reasons the library approach doesn't
   Hans> work are that straightforward.  The pthread approach comes
   Hans> surprisingly close to working.

I'll disagree with Hans on this point, and argue my point as follows. The only
reason pthreads comes close is because many of its routines, like
pthread_mutex_lock, are *required* to be implemented as non-inline routines,
which constrains optimization of all calls to these routines. However, many of
the pthread routines are small and appear in performance critical sections of
code, and hence, are obvious targets for inlining. To solve this problem, a
compiler can be given the source code for pthreads routines so it can inline.
But if the compiler does not know about concurrency, it will incorrectly
intermix pthreads code with adjacent code at the call site. So I stand firm on
my statement: if you want to generate efficient and correct code in concurrent
programs the compiler MUST be aware the program is concurrent, otherwise basic
sequential optimizations will invalidate the program.  pthreads solves the
problem by throwing away performance to gain correctness, which I have pointed
out is the necessary trade off.

   - I don't have a strong feeling one way or another about active objects.
   Based on what I've seen it seems like a good idea.

Any Ada concurrent programmer or people doing message passing will tell you
that direct communication among active objects (rendezvous) is very important
in complex concurrent programs (but not necessary). This is in contrast to
indirect communication among active objects using passive objects like
monitors.  Both direct and indirect communication (e.g., telephone versus
letter) have an important place in concurrent programming, and hence, each
needs to be supported as a first-class mechanism in a language. For example,
Java only really supports indirect communication via monitors, so certain
concurrent problems are awkward to express.

   The idea of having a blocking destructor as a join replacement struck
   me as elegant.

Ada supports this notion, too, via the "terminate" clause in the "select"
statement (as does uC++ by accepting the destructor). However, accepting the
destructor does not work in all cases; it is still necessary to have a "join",
"stop", "done" routine if the programmer intents to continue using the object
after its thread is told to terminated. This situation is analogous to creating
an active object and using it before starting its thread. Both situations are
rare, but both should be supported in some way. uC++ supports both approaches.

   - I would need considerably more evidence to be convinced that we should
   include coroutines here.  If you mix coroutines and threads, presumably
   coroutines have to use a locking discipline that's very similar to
   preemptive threads anyway.  Do we get more than a confusing mixture of
   two scheduling disciplines?

Yes, there are many people with coroutine phobia. ;-) And yes, mixing
coroutines with threads requires a locking discipline but so does mixing normal
objects with threads require a locking discipline. So what is your point?  A
coroutine is not a concurrent construct (I've had this battle with Doug Lea for
several years); a coroutine-monitor is used in this situation.

A coroutine is useful in and of itself. It cannot be trivially simulated by any
other construct in a programming language. The killer app for a coroutine is
device drivers. A recent paper by Swift, Bershad, Levy (TOCS 23(1) Feb 2005)
points out that 85% of all OS crashes result from device driver failures;
anything that improves this situation is a significant advancement. Part of
their improvement mechanism involves the use of coroutines (see their
description of control transfer in the 3rd paragraph on page 86). The key
difference between a coroutine and task is that during a context switch the
coroutine does not perform any scheduling because the target coroutine is
directly specified (also it is unnecessary to save the same amount of state as
the coroutine is not as complex as a task).  Hence, the cost of a coroutine
context switch is usually less than half of task. Finally, once you have a
task, all the mechanisms for the coroutine are there. All that is necessary is
to tease out the simpler coroutine and make it available to the programmer. In
other words, coroutines fall out in the wash; you just have to realize they are
sitting there waiting to get out. In uC++, the task base-type inherits from the
coroutine base-type showing how one kind of type builds directly on the other.
It is amazing how simple and elegant it is once you see it.

   - I am opposed to a mutual exclusion mechanism in which all public member
   functions implicitly acquire an object-specific lock on entry and release it
   on exit. That's simpler than possible.  There needs to be an easy way to
   acquire a user-determined lock inside a block and to export some member
   functions that don't acquire locks (e.g. to access read-only state) even if
   others do.

I agree. In uC++, the mutex qualifier on a "class" indicates the _default_
mutual exclusion for the public members of the class, but each individual
public member can be explicitly overridden with the "nomutex" qualifier:

    mutex class foo { // public member routines default to mutex
        // private and protected member routines default to nomutex
        mutex void m1(); // explicit mutex to be called by a friend task
      public:
        void m2();   // implicit mutex because of class default
        nomutex void m3() // explicit no mutex overriding class default
    }

or you can do it the Java way:

    nomutex class foo { // equivalent to "class foo"
        // private and protected member routines default to nomutex
        mutex void m1(); // explicit mutex to be called by a friend task
      public:
        mutex void m2(); // explicit mutex overriding class default
        void m3() // implicit no mutex because of class default
    }

I believe this handles most situations. I would need to be talked into having a
separate mutex statement (like the Modula-3/Java/C# lock statement).  So in
general, the presence of one mutex member is a class makes it into a mutex
object, and implies the destructor is mutex so the object cannot be deleted if
there is a thread executing in the object.

   I need to be able to associate locks with e.g.  array sections.  I think it
   also leads to the kind of "over-synchronized" libraries that we saw in Java
   1.0.  (I'm probably an extremist on this issue.  I don't even like RAII for
   locks that much.  It often matters a lot exactly how long and where you hold
   a lock, either for performance or to avoid deadlocks.  I think that making
   it implicit hides a hard problem that you can't avoid thinking about.  Locks
   are hard, and I don't think there's a way around that.)

I disagree and agree with your desires.

I disagree because I believe we should follow the 90-90 rule for concurrency in
C++. That is, 90% of concurrent programs can be written at a high-level and
achieve 90% efficiency. This means concurrency can move from being an esoteric
art to a general programming methodology, like objects. The monitor in Java
demonstrates just how important a high-level construct is in terms of
simplifying the mutual-exclusion and synchronization aspect of concurrent
programming. In general, software developers are totally happy with the 90-90
rule.  For example, each object in your array can be a monitor (mutex) object
(or even a task) so each has its own locking. It should be possible to create
arrays and linked lists of coroutines, monitors, coroutine-monitors, and
tasks. Anything that can be done with a class should be able to be done with
the other kinds of types (i.e., orthogonality of design).  This includes
inheritance and generics. uC++ accomplishes this with only a few restrictions
on inheritance because not all the combinations of types make sense.

I agree because to achieve 90% efficiency for those 90% of concurrent programs
means it is probably necessary to use, directly or indirectly, the 10% of code
that has to do some low-level concurrency tricks. Therefore, it is essential we
provide support for the tricks, e.g., double check, atomic assignment, etc.
Nevertheless, the bias should be to stay at the high level as much as possible,
which means providing expressive and efficient high-level control-flow
constructs for C++.

   - Ugly detail: Unfortunately, I think it's unavoidable that the real
   proposal minimize keyword introductions, and that any new short keywords
   start with a double underscore.  Anything else breaks existing code.  And
   there is very little that will generate as much opposition as quickly as
   breaking existing code.  All the recycling of keywords in C++ unfortunately
   has a good reason.  It's too early to worry about that a lot, but it's worth
   keeping in mind.

I agree. But I think we should design what we believe is the best solution, and
worry about details like keywords later. However, if we try hard from the
start, we can keep the number of new keywords to a minimum, making the
finalization process easier.