Threads suck...

- Difficult to understand control flow
- Difficult to test and debug
  - Most synchronisation errors are timing-dependent and only result
    in breakage under heavy load
- Language and library semantics vaguely defined

...but they're a necessary evil

- Multiple threads are required to make use of multiple processor cores
  - Multicore and hyperthreading processors are becoming standard in
    desktops and even laptops
  - Servers already tend to have many processors
  - Reference: Sutter 2005
- Inter-thread communication is much cheaper than inter-process
  - Processes can also share memory, but that brings much of the same
    challenges as multithreading plus the problem of differing
    virtual addresses (which breaks C++ polymorphism)

How to structure applications into threads

- Easy routes
  - Server: one kernel thread per client
    - Performs badly under heavy use
      - Uses lots of VM space
      - High scheduling and synchronisation overhead
  - Interactive application: one kernel thread per task
    - Often one task uses the majority of the processor time, so this
      doesn't improve performance much if at all
- Better ways
  - Server: thread pool, one kernel thread per processor
    - Harder to write - application needs its own scheduler
    - May not interact well with other applications
    - All I/O should be non-blocking (can have extra threads to be
      released if a running thread is about to block)
  - Interactive application: UI thread plus one thread per processor
    for processor-intensive work
    - Can be hard to parallelise
  - High-performance computing: use MPI, OpenMP, or other framework
    for parallelism

Thread synchronisation

- Do it too little and your program is broken
- Do it too much and your threads can spend most of their time waiting
  for each other
- Mutexes (a.k.a. locks) and condition variables
  - Good building blocks
- Lockless algorithms and data structures
  - For experts only
    - And even they make mistakes (e.g. double-checked locking)
  - Examples:
    - Double-checked locking with volatile in Java
    - Read-Copy-Update

What was that about semantics?

We tend to assume that the operations of our program - reading and
writing variables, synchronising, and performing I/O - happen in the
same order as written in the program.  You probably all know, however,
that compilers can reorder operations to make your program run faster.
They can also eliminate or duplicate memory accesses in some cases by
caching variables in registers.  And some processors do not support
writing to memory locations of certain sizes in a single operation, so
writing to a variable may also involve reading and writing the memory
storing other variables.  (This is true for most processors in the
case of bitfields.)  Aside from this, processors may reorder reads and
writes depending on which part of the memory system has the current
version of that line of memory (RAM, own cache, another processor's
cache, etc.), within some restrictions.

All this is OK in a single-threaded program - though reordering by the
compiler must be borne in mind when writing signal handlers.  However
it can introduce disastrous race conditions into a multithreading
program (or one that shares memory with other processes).
Synchronisation primitives such as mutexes deal with the processor
reordering, but may not inhibit compiler reordering.

In order for a language to support multithreading properly, it must
have a memory model that defines to what extent reordering is possible
and how the programmer can limit it.

Reference: Boehm 2004

Few languages have such models yet, with the notable exception of
Java.  The C and C++ standards describe a single-threaded abstract
machine, and the C binding for POSIX threads speaks vaguely in terms
of "memory locations" rather than language semantics.  However there
is active work on such a memory model for C++, some of which may be
applicable to C.  C#'s memory model is only vaguely specified.
For higher-level languages there may or may not be a problem to
resolve.

References: Manson et al. 2004,2005, Boehm 2006

How does this relate to Debian?

If you're a maintainer for a program that uses multithreading - and
more and more programs do - you are likely to see increasing numbers
of bug reports relating to synchronisation errors and maybe problems
with language semantics as SMP becomes the norm.  Debian's support for
multiple architectures with different processor memory models also
means that some such bugs will only appear on some architectures.

Further reading

Boehm, Threads Cannot Be Implemented As a Library, HP Technical Report 2004
http://www.hpl.hp.com/techreports/2004/HPL-2004-209.html

Sutter, The Free Lunch is Over, Doctor Dobb's Journal 2005
http://www.gotw.ca/publications/concurrency-ddj.htm

Manson et al., Java Memory Model and Thread Specification (JSR-133), Java Community Process 2004
http://jcp.org/aboutJava/communityprocess/final/jsr133/index.html

Manson et al., The Java Memory Model, Principles of Programming Languages 2005
http://rsim.cs.uiuc.edu/Pubs/popl05.pdf

Boehm, Threads and memory model for C++, personal web site 2006
http://www.hpl.hp.com/personal/Hans_Boehm/c++mm/