[cpp-threads] High-level concurrency issues

Tue Oct 18 06:11:43 BST 2005

 -------------- Original message ----------------------
From: "Herb Sutter" <hsutter at microsoft.com>
> 
> > Somewhat playing devil's advocate here, could someone make that
> > statement about "not composable" precise?  I've certainly seen it
> > before, but I think it would help to understand exactly what we're
> > talking about here.
> 
> Below I've pasted the relevant section of my recent article on "Software
> and the Concurrency Revolution" (coauthored with Jim Larus) that
> appeared in last month's ACM Queue. The key paragraph is:
> 
>   A fundamental problem with locks is that they are not composable.
>   You can't take two correct lock-based pieces of code, combine them,
>   and know that the result is still correct. Modern software
>   development relies on the ability to compose libraries into larger
>   programs, and so it is a serious difficulty that we cannot build on
>   lock-based components without examining their implementations.
> 
> Then there's some more discussion.
> 
> > This can get me into trouble if atomicity or other considerations
> > require me to hold a lock during a client call-back, for example.
> 
> Or call a virtual function, or use a template parameter, or let control
> flow through any other customization point into unknown code.
> 
> The general issue that calling any unknown code while holding a lock is
> a recipe for deadlock. You really are required to know the
> implementation details of all code that you might call while you are
> holding a lock.
> 
> > But
> > in most cases, it seems to work tolerably well, and layers of calls
> seem
> > to compose reasonably.  And I'm not 100% convinced that the cases in
> > which it breaks don't have single-threaded lock-free analogs in which
> I
> > break object invariants with an unintentional recursive call.
> 
> I'd be interested to see analogs. The main reason we're just getting
> away with it right now is that few programmers are writing any
> significant code that uses locks (explicitly concurrent or otherwise).
> 
> For example, imagine someone writes an object that lazily initializes
> itself from a file or database, and when the lazy initialization gets
> triggered it acquires a file or database lock to perform its
> initialization. Now say you put an object of that type into a
> synchronized container like a .NET SyncHashTable (a monitor), and the
> latter in particular will lock(this) and then call object.GetHashCode --
> which can trigger a lazy read and acquire a lock. That's fine in itself,
> but of course now you have a deadlock just waiting to happen if any
> other code in the system (that exists today or that could be written
> tomorrow) might try to acquire the same file/database lock (e.g., to
> lazily initialize a similar object) and then while holding that use the
> hash table. Rare? Maybe. But so are all races, and there's nothing in
> the type system or in mainstream tools that will tap the programmer on
> the shoulder and say "hey, buddy, you're asking for deadlock here, and
> oh by the way you shouldn't have tried to use those two objects together
> which is incorrect even though each of them is correct on its own."
> 
> But even that's not the worst part. The nightmare scenario, which
> unfortunately is all too common in modern software, is when two types
> that use locks internally (here the container type and the lazily
> initialized type) are written by different vendors and are composed for
> the first time on an end user's machine (thanks to plugins, DLLs and
> dynamic loading, etc.). No testing can ever catch that problem before
> either vendor ships their component, other than simply identifying
> points where control flow can leave the vendor's code while a lock is
> held. Lock-based code just is inherently unsafe to compose.
> 
> > This does not mesh at all well with a model in which all locking is
> done
> > on entry to member function, and locks the object itself.  But I've
> > stated previously that I don't like that model.  And I don't claim I
> > know how to program with it.
> 
> Frankly, monitors are the worst of the lot and almost never appropriate.
> They should definitely not be encouraged, and should be actively
> discouraged because they usually lock too much and/or not enough
> (usually both) and are disasters waiting to happen by making deadlock
> situations much more likely.
> 
> Java learned that, though only after shipping synchronized containers in
> version 1. C++ learned it too -- just look at the trouble C++ vendors
> got into with refcounted implementations of basic_string, which to be
> safe had to lock the object on every possibly mutating operation; the
> industry has now moved en masse away from that, and I take some credit
> for making that happen by repeatedly publicizing the problem in articles
> and books and talking to the vendors. (Remind me to tell you some truly
> hilarious stories about that next month in Las Vegas, whoever is there.)
> 
> Herb
> 
> 
> P.S.: Here's the excerpt from the recent Queue article.
> 
> The Problem of Shared State, and Why Locks Aren't the Answer
> ------------------------------------------------------------
> 
> Another challenging aspect of unstructured parallelism is sharing
> unstructured state. A client application typically manipulates shared
> memory organized as unpredictably interconnected graphs of objects.
> 
> When two tasks try to access the same object, and one could modify its
> state, if we do nothing to coordinate the tasks we have a data race.
> Races are bad, because the concurrent tasks can read and write
> inconsistent or corrupted values.
> 
> There exists a rich variety of synchronization devices that can prevent
> races. The simplest of these is a lock. Each task that wants to access a
> piece of shared data must acquire the lock for that data, perform its
> computation, and then release the lock so other operations on the data
> can proceed. Unfortunately, although locks work, they pose serious
> problems for modern software development.

The most serious problems in my experience arise from people using 
locks without thinking through the concurrency issues in a design.
The mind set often seems to be that locks should be a silver bullet
for concurrency, so just apply locks liberally without doing true
analysis and design.

> 
> A fundamental problem with locks is that they are not composable. You
> can't take two correct lock-based pieces of code, combine them, and know
> that the result is still correct. Modern software development relies on
> the ability to compose libraries into larger programs, and so it is a
> serious difficulty that we cannot build on lock-based components without
> examining their implementations.

This is a problem with any low level capability used with insufficient
analysis and design. Locks are not, in my opinion, particularly nasty
in this area. Their nastiness is about average.

> 
> The composability issue arises primarily from the possibility of
> deadlock. Deadlock, in its simplest form, happens any time two locks
> might be acquired by two tasks in opposite orders; task T1 takes lock
> L1, task T2 takes lock L2, and then T1 tries to take L2 while T2 tries
> to take L1. Both block forever. Because this can happen anytime two
> locks can be taken in opposite orders, calling into code you don't
> control while holding a lock is a recipe for deadlock.
> 

Again, deadlock is a consequence of design, not merely a consequence
of whether or not locks are used.

One of the hazards associated with the ability for any thread to call
functions in any other thread is the complexity of the synchronization
issues. The complexity exists whether the synchronization is achieved
using locking or non-locking mechanisms. Synchronization, no matter 
what the mechanism is never free of overhead. There is always a cost
if more than one thread is involved.

> But that is exactly what extensible frameworks do, as they call virtual
> functions while holding a lock. Today's best-of-breed commercial
> application frameworks all do this, including the .NET Frameworks and
> the Java standard libraries. We have gotten away with it because
> developers aren't yet writing lots of heavily concurrent programs that
> do frequent locking. Many complex models attempt to deal with the
> deadlock problem, such as backoff-and-retry protocols, but they require
> high programmer discipline and some introduce their own problems (e.g.,
> livelock).

I have much more experience using the Ada concurrency model than the Java
concurrency model. I also have much more success with the Ada model.
In the Ada model direct synchronization of tasks is achieved using the
rendezvous, while asynchronous communication through a shared memory
region is achieved using protected objects, or in special cases, simple
atomic operations. In either case the interfaces between tasks, and 
between tasks and shared memory regions, are clearly and unambiguously
defined. 

Protected objects are always passive concurrent objects. Tasks are
always active concurrent objects. It is a bounded (run-time) error to
invoke an operation in a protected object that is potentially blocking.
The Ada reference manual defines potentially blocking operations as:
   a select statement
   an accept statement
   an entry call statement
   a delay statement
   an abort statement
   task creation or activation
   an external call on a protected subprogram (or an external requeue)
   with the same target object as that of the protected action
   a call on a subprogram whose body contains a potentially blocking
   operation

If a bounded error is detected, the exception Program_Error is raised.
If not detected, the bounded error might result in deadlock.

Certain language-defined subprograms are potentially blocking. 
In particular, the subprograms of the language-defined input-output
packages that manipulate files (implicitly or explicitly) are 
potentially blocking.

Protected objects are allowed to have three different kinds of
subprograms. Protected functions are correct only if they provide
read-only access to the protected object. Protected procedures
provide unconditional read-write access to the protected object.
Protected entries provide conditional read-write access to the
protected object. Each action of a protected object should 
only manipulate the data member(s) of that protected object.

All potentially blocking operations should be called by tasks,
not protected objects. A task can block asynchronously from
other tasks without creating a deadlock.

I have found Ada concurrency to be highly composable.

<snip>

> A more basic problem with locks is that they rely on programmers to
> strictly follow conventions. The relationship between a lock and the
> data that it protects is implicit, and it is preserved only through
> programmer discipline. A programmer must always remember to take the
> right lock at the right point before touching shared data. Conventions
> governing locks in a program are sometimes written down, but they're
> almost never stated precisely enough for a tool to check them.

This is true when all locking is handled at a low level.
When locking is handled implicitly there is very little programmer
discipline required. Ada task entries and protected subprograms all
provide implicit locking facilities. There is never any ambiguity
concerning which kind of lock is being used in an Ada program.
The Language Reference Manual is very clear about the kinds of locks
used. The programmer cannot misapply the locks.

> 
> Locks have other, more subtle problems. Locking is a global program
> property, which is difficult to localize to a single procedure, class,
> or framework. All code that accesses a piece of shared state must know
> and obey the locking convention, regardless of who wrote the code or
> where it resides.
> 

In Ada code the locking for a protected object is implicitly associated
with the kind of subprogram being called. The protected subprogram 
handles its own locking. Explicit cooperation is not needed, and is
in fact, not allowed. The compiler itself sets and enforces the locking
policies. Those policies are not implementation defined. They are defined
as part of the language standard.

> Attempts to make synchronization a local property do not work all the
> time. Consider a popular solution like Java's synchronized methods. Each
> of an object's methods can take a lock on the object, so no two threads
> can directly manipulate the object's state simultaneously. As long as an
> object's state is only accessed by its methods and programmers remember
> to add the synchronized declaration, this approach works.

In my opinion, the Java model is a poor combinatin of local and distributed
locking responsibility. In fact, I believe the Java synchronization model
often inverts responsibilities, making efficient concurrent designs difficult
at best and impossible at worst.

The Ada model provides explicit interfaces to tasks and protected objects.
This means that a task or protected object's state can only be accessed by
its published interfaces. Ada's strong interface control works very well.

> 
> There are at least three major problems with synchronized methods.
> First, they are not appropriate for types whose methods call virtual
> functions on other objects (e.g., Java's Vector and .NET's
> SyncHashTable), because calling into third-party code while holding a
> lock opens the possibility of deadlock. Second, synchronized methods can
> perform too much locking, by acquiring and releasing locks on all object
> instances, even those never shared across threads (typically the
> majority). Third, synchronized methods can also perform too little
> locking, by not preserving atomicity when a program calls multiple
> methods on an object or multiple methods on different objects. As a
> simple example of the latter, consider a banking transfer:
> 
>   account1.Credit(amount); account2.Debit(amount);

In general concurrent designs should strive for minimal
synchronization. Syncrhonization always dilutes the efficiency
of concurrent algorithms. The obvious correllary to this is that
interfaces between concurrent units must be minimized. 
Minimization of interfaces cannot be achieved without strong
interface control.

> 
> Per-object locking protects each call, but does not prevent another
> thread from seeing the inconsistent state of the two accounts between
> the calls. Operations of this type, whose atomicity does not correspond
> to a method call boundary, require additional, explicit synchronization.

This is one reason why banks have historically performed batch
calculations to balance accounts during periods when account transactions
are forbidden. 

If one needs to view account transactions in real-time, as in the
example above, one needs to view a set of transactions much like
a fluid flow control problem in a pipeline. Water may be pumped
from one reservoir to another. There is a perceptible delay between
the time water leaves one reservoir and enters the other. The data
concerning the volume of each reservoir remains correct at all times.
The pipeline itself also contains a calculable volume of water.

> 
> For completeness, we note two major alternatives to locks. The first is
> lock-free programming.  By relying on a deep knowledge of a processor's
> memory model, it is possible to create data structures that can be
> shared without explicit locking. Lock-free programming is difficult and
> fragile; inventing a new lock-free data structure implementation is
> still often a publishable result.
> 
> The second alternative is transactional memory, which brings the central
> idea of transactions from databases into programming languages.
> Programmers write their programs as a series of explicitly atomic
> blocks, which appear to execute indivisibly, so concurrently executing
> operations see the shared state strictly before or after an atomic
> action executes. Although many people view transactional memory as a
> promising direction, it is still a subject of active research.

Interesting concept. I will keep my eyes open for the results of those
research efforts.

--
Jim Rogers
Colorado Springs, Colorado
U.S.A.