[cpp-threads] RE: C++ Connections proposal

Mon Apr 25 21:44:01 BST 2005

> From: Peter A. Buhr
> Sent: Monday, April 25, 2005 9:20 AM
> 
>    Hans> I actually don't think that the reasons the library 
> approach doesn't
>    Hans> work are that straightforward.  The pthread approach comes
>    Hans> surprisingly close to working.
> 
> I'll disagree with Hans on this point, and argue my point as 
> follows. The only reason pthreads comes close is because many 
> of its routines, like pthread_mutex_lock, are *required* to 
> be implemented as non-inline routines, which constrains 
> optimization of all calls to these routines. However, many of 
> the pthread routines are small and appear in performance 
> critical sections of code, and hence, are obvious targets for 
> inlining. To solve this problem, a compiler can be given the 
> source code for pthreads routines so it can inline. But if 
> the compiler does not know about concurrency, it will 
> incorrectly intermix pthreads code with adjacent code at the 
> call site. So I stand firm on my statement: if you want to 
> generate efficient and correct code in concurrent programs 
> the compiler MUST be aware the program is concurrent, 
> otherwise basic sequential optimizations will invalidate the 
> program.  pthreads solves the problem by throwing away 
> performance to gain correctness, which I have pointed out is 
> the necessary trade off.
I think that would be an issue if pthread_mutex_lock() used
something like Lamport's algorithm for mutual exclusion, so
that the implementation looked like regular C++ code.  In practice,
I haven't seen any case in which this is much of an issue.
The actual implementations of pthread_mutex_lock use either
compiler intrinsics for something like CAS, or inline assembly
code that generates these instructions.  In both cases, the
compiler either understands the memory ordering restrictions
associated with the intrinsics, or (gcc) understands that the
inline assembly code inhibits reordering.  My guess is that for
most pthreads implementations you wouldn't break things any
more by inlining the pthread primitives.  You are just moving
things down a level; the "opaque calls" are now the CAS
instructions or the like.

I'm not sure how many systems actually have the option of
inlining pthread primitives.  My impression is that a major
reason for not doing so is to allow dynamic replacement of
(parts of) the underlying threads library (subject to mutex size
constraints, etc.)  On linux, I believe this is used for
example to get you a very fast pthread_mutex_lock implementation
if you don't link against libpthread, and hence don't actually
have threads available.  On systems that need more than 100
cycles for a CAS, the inlining issue is probably secondary.

...

Re: explicit lock statements/constructors/ ...

Really tricky concurrent code aside,
I find that I very often, even in simple code, need to lock
something other than the object itself, e.g. static data
associated with a class.  If I only have per method locking,
I can break that out into a separate e.g. static method.
But I will often need atomicity across multiple actions,
each of which naturally is a separate call, so I think this
is at odds with good program structuring.

My experience is also that when in doubt, you want to defer
locking decisions to your caller, because locks may be needed
at a coarser granularity than you know about.  This again means
that you often want to lock an object from a method in another
object.

Hans