C++ Threading Proposal

Sat Oct 2 01:20:06 BST 2004

The Ada model makes clear distinctions between direct calls between two
tasks and sharing information through an intermediate buffer not contained
in either task.
Direct calls between tasks are implemented using the Ada Rendezvous model
developed in the 1983 version of Ada. The Ada Rendezvous model is always
synchronizing between the calling and the called task. All the locking
semantics are handled automatically.

The Rendezvous model does not provide a normal Ada subprogram call
interface. It provides a distinct synchronization point. An Ada task
declares its synchronization points by defining a set of task entries. Task
entries, like protected entries, maintain a queue of calling tasks. Task
entries may have guard or boundary conditions associated with them. An Ada
task with no entries is closed to direct calls from another task.

task A is
  entry G (Parm1 : in SomeData; Parm2 : out SomeOtherData);
end A;

task B is
   entry F(Parm1 : in SomeData; Parm 2 : out SomeOtherData);
end B;

A may call B.F at any time. Similarly B may call A.G at any time.
When A calls B.G, A is suspended until B accepts the entry call.
Similarly when B calls A.F, B is suspended until A accepts the entry call.
Locking is handled correctly, but there is a strong possibility of deadlock
in this design.

The Ada ability to call entries selectively handles the deadlock issue. If
the call of A to B.F is done selectively, with an appropriate timeout, there
is no deadlock.

task body A is
   Local1 : SomeData;
   Local2 : SomeOtherData;
begin
   loop
      select
         accept G(Parm1 : in SomeData; Parm2 : out SomeOtherData) do
            Local1 := Parm1;
            Parm2 := SomeOtherData;
         end G;
      or
         delay 0.01;
          select
             B.F(Parm1 => Local1, Parm2 => Local2);
          or
             delay 0.005;
          end select;
      end select;
   end loop;
end A;

Task A continually loops through its actions. The outer select statement
causes task A to attempt to accept a call to its G entry. A will normally
suspend until the G entry is called by another task. The "or" alternative in
the select statement sets a timeout for the suspension. In this case the
timeout is set to 0.01 seconds. Task A cancels its accept call if the
timeout completes before another task calls G. In the event the accept call
is cancelled, the task immediately attempts to call Task B's F entry. Task A
calls the entry selectively. In this case it only waits 0.005 seconds for B
to accept the F entry. When any of these actions complete, A executes
another iteration of its control loop.

Task B will also use a set of selective accepts and entry calls to perform
its work.

This approach completely eliminates deadlocks.

Notice that the locking and synchronization issues of communications between
A and B are handled automatically. The language syntax also allows each task
to be the master if its own timing by selectively terminating the locks and
synchronization based on a timeout.

Jim Rogers
----- Original Message ----- 
From: "Boehm, Hans" <hans.boehm at hp.com>
To: "Jim Rogers" <jimmaureenrogers at att.net>; "Andrei Alexandrescu"
<andrei at metalanguage.com>; "Ben Hutchings" <ben at decadentplace.org.uk>;
"Kevlin Henney" <kevlin at curbralan.com>; "Bill Pugh" <pugh at cs.umd.edu>;
"Douglas C. Schmidt" <schmidt at dre.vanderbilt.edu>; "Doug Lea"
<dl at cs.oswego.edu>
Sent: Friday, October 01, 2004 12:20 PM
Subject: RE: Re: C++ Threading Proposal

I'm not sure I understand what you're saying.  I don't see how the Ada model
makes anything automatic.

Let's assume we have two objects A and B with member functions f and g.
Let's assume A and B refer to each other, and A -> f() calls B -> f(), and
B -> g() calls A -> g().  Let's assume they all read and write potentially
shared data.

In C or C++ with pthreads or win32 threads I use an appropriate locking
discipline, e.g. always acquiring by having A ->f() acquiring B's lock
before acquiring A's lock.  In Java I need to go slightly out of my way, but
it's not too hard.

But if I blindly make A and B "protected" (in Ada) or "synchronized" (in
Java), simultaneous calls to A->f() and B->g() are likely to result in
deadlock.  This may be automatic, but I don't see how it's a general
solution.

I just don't see how to automate locking in the same way as memory
deallocation.  Transactional memory is easier to use, but we don't know how
to implement it.  I've heard other proposals, but they seem to serialize
everything and are hence not useful for multiprocessors.  Locking is
inherently hard (though easier than lock-free algorithms).  But I think it's
the best general purpose solution we have.

There is a simple sieve of Eratosthenes example on slide 12 of
ftp://ftp.dagstuhl.de/pub/Proceedings/03/03431/03431.BoehmHans-J.Slides.pdf
with performance measurements on the following slides.  It demonstrates a
simple case in which you get no multiprocessor performance benefit at all
with a lock-based solution.  The only way to really take advantage of a
multiprocessors is to use a cleverer lock-free algorithm which observes that
races are benign in this case, or possibly by redesigning the algorithm from
scratch.  (I didn't try the latter alternative, but I doubt it's easy.)
I've run into cases very similar to this in real life, though the
performance impact is perhaps not quite as dramatic.

Hans

> -----Original Message-----
> From: Jim Rogers [mailto:jimmaureenrogers at att.net]
> Sent: Thursday, September 30, 2004 6:41 PM
> To: Boehm, Hans; Andrei Alexandrescu; Ben Hutchings; Kevlin
> Henney; Bill
> Pugh; Douglas C. Schmidt; Doug Lea
> Subject: Re: Re: C++ Threading Proposal
>
>
> I realize that there is a significant body of C and C++
> threaded programs.
> If one important goal is compatibility with existing
> implementations we have
> several choices;
>
> * Do nothing about threading for C++. This will be the least
> controversial
> and
> will by definition be compatible with current practice.
>
> * Provide native C++ implementations of Posix and Win32
> thread libraries to
> better support C++. Such implementations would not inhibit the use of
> current
> C threading libraries.
>
> * Provide a C++ memory model providing support for C++ implementations
> of Posix, Win32, and other thread libraries. This approach relies upon
> extra-
> standard implementations of C++ thread libraries. This
> approach would not
> inhibit the use of current C threading libraries.
>
> * Provide new threading primitives in the C++ language. This
> also implies
> a C++ memory model aimed at threading support. This would
> introduce the
> greatest change to C++ syntax and reserved words. This approach would
> not inhibit the use of current C threading libraries.
>
> As you can see, compatibility with existing implementations
> is not at risk
> with
> any of these possible approaches.
>
> My experience in Ada, with threading built into the language
> syntax, is
> similar
> to memory management when using a reliable garbage collector. Thread
> synchronization can be handled manually, just as memory
> management can be
> handled manually. Manual synchronization can become very
> complex, just as
> manual memory management. Thread synchronization can be efficiently
> automated,
> just as memory management can be efficiently automated. The
> reliability
> resulting
> from automation is similar for both thread synchronization and memory
> management.
>
> Ada programmers can call C thread libraries directly, just as C++
> programmers do.
> I know of no Ada programmers who choose such a solution.  There is no
> performance advantage to using the C libraries directly,
> however such an
> approach
> does make the code more complicated to write, understand, and
> maintain.
>
> Jim Rogers
> ----- Original Message ----- 
> From: "Boehm, Hans" <hans.boehm at hp.com>
> To: "Jim Rogers" <jimmaureenrogers at att.net>; "Andrei Alexandrescu"
> <andrei at metalanguage.com>; "Ben Hutchings" <ben at decadentplace.org.uk>;
> "Kevlin Henney" <kevlin at curbralan.com>; "Bill Pugh" <pugh at cs.umd.edu>;
> "Douglas C. Schmidt" <schmidt at dre.vanderbilt.edu>; "Doug Lea"
> <dl at cs.oswego.edu>
> Sent: Thursday, September 30, 2004 3:19 PM
> Subject: RE: Re: C++ Threading Proposal
>
>
> > -----Original Message-----
> > From: Jim Rogers [mailto:jimmaureenrogers at att.net]
> >
> [ Various sections deleted.]
> > Ada does not provide any guarantee of execution order. This
> > is particularly
> > appropriate since we have computer architectures that provide
> > out-of-order execution.
> Actually out-of-order execution is not the issue. This example
> actually works correctly on Pentium 4s (if the compiler does
> no reordering), which rely heavily on out-of-order execution.
> Without volatile declarations (which would generate different
> instructions), it doesn't work on current Itanium processors,
> which execute instructions in-order.  Recent PA-RISC implementations
> are out-of-order but sequentially consistent.  It depends much more on
> how write buffers etc. work.
> >
> > In your example above, if x_initialized is volatile, while x
> > is not, then
> > two
> > different threads on separate processors can be expected to
> > see different
> > values for x, since each processor has its own copy of x in
> > local cache.
> >
> > If x is also volatile, another thread will see the "x = 17"
> once x is
> > updated.
> > The way to ensure that both x_initialized and x will properly
> > be viewed and
> > set by different threads is by the use of critical regions.
> > Atomic compare
> > and
> > swap operations are never specified in hardware for
> arbitrarily large
> > regions of
> > memory. This is the reason that critical regions have been
> > defined. This is
> > also
> > the situation where mutex locks are very useful.
> There was a lot of discussion of this during the Java effort.
> The problem is that locks or critical regions are expensive, at
> least on a multiprocessor, and therefore people try to use
> alternatives
> for performance-critical applications.  (Search for "double-checked
> locking".)  In a few cases, the alternatives are essential to get
> any real speedups from multiprocessors.  Whether or not it's
> essential, we know that some significant code bases (e.g. the one from
> Redmond) regularly use such techniques, and there needs to be a way
> to do it correctly.
>
> For the example I gave, hardware almost always provides much
> cheaper ways to
> ensure correctness than by using a lock.  There needs to be a way to
> get access to that.
>
> You are right that this is tricky because compare-and-swap operations
> are limited at best.  I also agree that if critical regions give
> you adequate performance, you shouldn't bother with this stuff.
>
> >
> > Java allows threads to send messages to each other through
> any public
> > method, whether or not that method contains a synchronized
> block. This
> > is an invitation to race conditions, because Java does not
> > ensure atomic
> > updates of all the data members in a Thread class. I would
> > suggest that
> > the public functions of a C++ Thread class should be implicitly
> > synchronized.
> Typically a lot of shared data is not directly in a class identifiable
> as being thread-related.
>
> I don't believe the language should ever add implicit synchronization,
> except possibly where it's necessary to protect shared data that is
> not programmer-visible.  Synchronization is expensive.  Adding more of
> it is in general unsafe and can certainly lead to deadlocks.  In my
> view, the original Java libraries suffered from way too much implicit
> synchronization.
>
> Having said that, I don't think we should follow the Java approach
> down to the thread-related classes in the standard library.  The C++
> standard library is different from the Java standard library, and we
> aren't going to change that.  In either case, I find I typically
> need a manual while I'm programming, and I can live with that.
> I do want the memory visibility rules to be similar, since those are
> conceptually hard, and they are something that most people will have
> to be taught.  And teaching two sets of rules is a significant extra
> effort.  (Plus there are the occasional compiler back ends that need
> to know both sets of rules, but that's a relatively minor issue.)
>
> There is a fair amount of existing multithreaded C and C++
> code.  I think
> we need to accommodate it as much as possible.  In cases in which
> it's inherently broken (e.g. double-checked locking) I think it's
> important to provide a reasonably painless path to fix it.  Thus
> I do think we need to accommodate something very close to at least
> the pthreads and probably win32 programming styles.
>
> Hans
> > >
> > > > -----Original Message-----
> > > > From: Jim Rogers [mailto:jimmaureenrogers at att.net]
> > > > Sent: Monday, September 27, 2004 9:02 PM
> > > > To: Boehm, Hans; 'Andrei Alexandrescu'; Ben Hutchings;
> > Kevlin Henney;
> > > > Bill Pugh; Douglas C. Schmidt; Doug Lea
> > > > Subject: Re: Re: C++ Threading Proposal
> > > >
> > > >
> > > >
> > > > ----- Original Message ----- 
> > > > From: "Boehm, Hans" <hans.boehm at hp.com>
> > > > To: "'Andrei Alexandrescu'" <andrei at metalanguage.com>;
> > "Ben Hutchings"
> > > > <ben at decadentplace.org.uk>; "Boehm, Hans"
> > <hans.boehm at hp.com>; "Kevlin
> > > > Henney" <kevlin at curbralan.com>; "Bill Pugh"
> > > > <pugh at cs.umd.edu>; "Douglas C.
> > > > Schmidt" <schmidt at dre.vanderbilt.edu>; "Doug Lea"
> > <dl at cs.oswego.edu>
> > > > Cc: "Jim Rogers" <jimmaureenrogers at att.net>
> > > > Sent: Monday, September 27, 2004 1:20 PM
> > > > Subject: RE: Re: C++ Threading Proposal
> > > >
> > > >
> > > > > My understanding of the Ada model (based also on a 1994
> > draft of the
> > > > standard) is that it's actually pretty good for the time.  It
> > > > differs from
> > > > the Java approach in several ways:
> > > > >
> > > > > 1) Writes to a field are allowed to "interfere with" at
> > > > least other fields
> > > > in that object.  I currently think this is a mistake given
> > > > current hardware
> > > > capabilities and the existing difficulties of writing
> > > > multithreaded code.
> > > > It was a reasonable choice in 1994.
> > > >
> > > > Protected procedures and protected entries are allowed to
> > > > modify any data in
> > > > the protected object.
> > > >
> > > > >
> > > > > 2) A concurrent write and other access (a race) is viewed
> > > > as an error,
> > > > just like the Pthread model.    Unlike in the Java
> > setting where this
> > > > creates security issues, this is OK for many C++ programs.
> > > > But it doesn't
> > > > address lock-free algorithms, which was a big part of our
> > motivation.
> > > >
> > > > I understand some of the appeal of lock-free algorithms. I
> > > > have not seen
> > > > documentation claiming that lock-free algorithms are
> > > > deterministic to the
> > > > levels needed for hard real time systems. Given this lack of
> > > > knowledge, I
> > > > would not focus all thread communication around only
> > > > lock-free algorithms.
> > > >
> > > > >
> > > > > 3) It uses guards on entry to a "protected procedure" (roughly
> > > > "synchronized method" in Java) instead of condition
> > > > variables.  I gather
> > > > that these are typically repeatedly evaluated on exit
> > from protected
> > > > procedures?  I think I mildly prefer explicit condition
> > > > variables (roughly
> > > > as in Pthreads, though I don't fully understand the tradeoffs
> > > > as to whether
> > > > or not a notification should hold the lock) to either
> > this or the Java
> > > > model, since I think it makes performance a bit more
> transparent.
> > > >
> > > > I would like to make a correction here. Protected objects are
> > > > allowed three
> > > > kinds of methods: procedures, entries, and functions.
> > > > Protected procedures are given exclusive access to the object
> > > > without guard
> > > > conditions.
> > > > Protected entries are given exclusive access to the object
> > > > when a specified
> > > > guard condition evaluates to True.
> > > > Protected functions are given shared access to the object
> > through an
> > > > implicit read-lock.
> > > >
> > > > Protected entries are similar to critical sections controlled
> > > > by a condition
> > > > variable. Every protected entry has an associated entry
> > > > queue. That queue is
> > > > used to queue up calls to the entry while the guard condition
> > > > evaluates to
> > > > False. Upon completion of a protected entry or protected
> > > > procedure all entry
> > > > guard conditions are evaluated. If a condition evaluates to
> > > > True and the
> > > > entry queue size is greater than 0 the current task executes
> > > > the entry body
> > > > as a proxy for the next task waiting in the queue. Entry
> > > > queues can be set
> > > > up as either priority queues or FIFO queues.
> > > >
> > > > >
> > > > > 4) It's a bit clearer than the Pthread model about
> > > > visibility rules, but
> > > > not as clear as I would like.  It's a bit more of a stretch
> > > > to argue that my
> > > > register promotion example is legal in Ada, but it's not
> > as clearly
> > > > prohibited as I would like.
> > > >
> > > > I am not familiar with your register promotion example.
> > > >
> > > > >
> > > > > 5) It doesn't seem to address atomic memory operations or
> > > > "unscoped" e.g.
> > > > conditionally acquired locks.
> > > >
> > > > The Ada tasking model does provide the ability to specify
> > > > atomic operations,
> > > > as well as volatile operations.
> > > > The Ada Language Reference Manual contains the
> following language:
> > > >
> > > > C.6 Shared Variable Control
> > > >
> > > > C.6 Shared Variable Control
> > > >
> > > >   1.. This clause specifies representation pragmas that
> > > > control the use of
> > > > shared variables.
> > > >
> > > > Syntax
> > > >
> > > >   2.. The form for pragmas Atomic, Volatile,
> > Atomic_Components, and
> > > > Volatile_Components is as follows:
> > > >
> > > >     3.. pragma Atomic(local_name);
> > > >
> > > >     4.. pragma Volatile(local_name);
> > > >
> > > >     5.. pragma Atomic_Components(array_local_name);
> > > >
> > > >     6.. pragma Volatile_Components(array_local_name);
> > > >
> > > >   7.. An atomic type is one to which a pragma Atomic applies.
> > > > An atomic
> > > > object (including a component) is one to which a pragma
> > > > Atomic applies, or a
> > > > component of an array to which a pragma Atomic_Components
> > > > applies, or any
> > > > object of an atomic type.
> > > >
> > > >   8.. A volatile type is one to which a pragma Volatile
> > > > applies. A volatile
> > > > object (including a component) is one to which a pragma
> > > > Volatile applies, or
> > > > a component of an array to which a pragma Volatile_Components
> > > > applies, or
> > > > any object of a volatile type. In addition, every atomic type
> > > > or object is
> > > > also defined to be volatile. Finally, if an object is
> > > > volatile, then so are
> > > > all of its subcomponents (the same does not apply to atomic).
> > > >
> > > > Name Resolution Rules
> > > >
> > > >   9.. The local_name in an Atomic or Volatile pragma shall
> > > > resolve to denote
> > > > either an object_declaration, a non-inherited
> > > > component_declaration, or a
> > > > full_type_declaration. The array_local_name in an
> > Atomic_Components or
> > > > Volatile_Components pragma shall resolve to denote the
> > > > declaration of an
> > > > array type or an array object of an anonymous type.
> > > >
> > > > Legality Rules
> > > >
> > > >   1.. It is illegal to apply either an Atomic or
> > > > Atomic_Components pragma to
> > > > an object or type if the implementation cannot support the
> > > > indivisible reads
> > > > and updates required by the pragma (see below).
> > > >
> > > >   2.. It is illegal to specify the Size attribute of an
> > > > atomic object, the
> > > > Component_Size attribute for an array type with atomic
> > > > components, or the
> > > > layout attributes of an atomic component, in a way that
> > prevents the
> > > > implementation from performing the required indivisible reads
> > > > and updates.
> > > >
> > > >   3.. If an atomic object is passed as a parameter, then the
> > > > type of the
> > > > formal parameter shall either be atomic or allow pass by copy
> > > > (that is, not
> > > > be a nonatomic by-reference type). If an atomic object is
> > > > used as an actual
> > > > for a generic formal object of mode in out, then the type of
> > > > the generic
> > > > formal object shall be atomic. If the prefix of an
> > > > attribute_reference for
> > > > an Access attribute denotes an atomic object (including a
> > > > component), then
> > > > the designated type of the resulting access type shall be
> > > > atomic. If an
> > > > atomic type is used as an actual for a generic formal derived
> > > > type, then the
> > > > ancestor of the formal type shall be atomic or allow
> pass by copy.
> > > > Corresponding rules apply to volatile objects and types.
> > > >
> > > >   4.. If a pragma Volatile, Volatile_Components, Atomic, or
> > > > Atomic_Components applies to a stand-alone constant object,
> > > > then a pragma
> > > > Import shall also apply to it.
> > > > Pragma Volatile is used to enforce properly shared global
> > updates on
> > > > multi-processor systems. Pragma Volatile forces each
> > > > processor to view and
> > > > update the shared variable directly, instead of through a
> > > > local copy of the
> > > > shared variable.
> > > >
> > > > >
> > > > > My general inclination is still to try for as much
> > commonality as
> > > > reasonable between Java and C++, since many programmers will
> > > > need to deal
> > > > with both.
> > > >
> > > > I suggest you avoid the Java model implemented through
> > wait()/notify()
> > > > operations. The Java model does not ensure that all waiting
> > > > threads will
> > > > ever be serviced. The Java model ensures that things will
> > > > work out "on the
> > > > average", which is not deterministic and is totally
> > > > inappropriate for hard
> > > > real-time systems.
> > > >
> > > > The Ada model of protected entries (which corresponds well
> > > > with Pthread
> > > > condition variables) includes a queue so that you can
> > > > deterministically
> > > > assure all waiting tasks will be processed in a
> predictable order.
> > > >
> > > > >
> > > > > Hans
> > > > >
> > > > > > -----Original Message-----
> > > > > > From: Andrei Alexandrescu [mailto:andrei at metalanguage.com]
> > > > > > Sent: Saturday, September 25, 2004 2:21 PM
> > > > > > To: Ben Hutchings; Hans Boehm; Kevlin Henney; Bill Pugh;
> > > > Douglas C.
> > > > > > Schmidt; Doug Lea
> > > > > > Cc: Jim Rogers
> > > > > > Subject: Fwd: Re: C++ Threading Proposal
> > > > > >
> > > > > >
> > > > > > Jim  Rogers  (cc'd)  sent  me some examples of using
> > > > Ada's concurrency
> > > > > > mechanisms, for possible inspiration.
> > > > > > One   is   attached   in   text   format,   and   the
> > > > other   is  at
> > > > > >
http://home.att.net/~jimmaureenrogers/Shared_Resource_Design_P
> > > > > atterns.html.
> > > > > I  am sending this to you FYI, and am also adding the
> > > documents to our
> > > > > nascent institutional memory.
> > > > > Andrei
> > > > > This is a forwarded message
> > > > > From: Jim Rogers <jimmaureenrogers at att.net>
> > > > > To: "Andrei Alexandrescu" <andrei at metalanguage.com>
> > > > > Date: Monday, September 20, 2004, 9:13:24 PM
> > > > > Subject: C++ Threading Proposal
> > > > > ===8<==============Original message text===============
> > > > > Hello Andrei,
> > > > > The attached file is a simple text file containing an
> > > example of Ada
> > > > > tasking.
> > > > > I used the attachment because my email tool was mangling my
> > > > > formatting.
> > > > > Jim Rogers
> > > > > ===8<===========End of original message text===========
> > > > > -- 
> > > > > Best regards,
> > > > >  Andrei
mailto:andrei at metalanguage.com
> > > >
> > >
> >
> >
>