[cpp-threads] RE: Does pthread_mutex_lock() have release emantics?

Mon May 9 18:12:53 BST 2005

Hans,

Thanks for including me in your discussion. I'm going to toss in my
opinion. This is one of those topics where I could argue for either
side. I suppose my counter argument to my own message would be the mutex
routines have to *always* work, and making them depend on an esoteric
memory model is going to introduce bugs that will be extremely difficult
to track down and fix.  Sigh.

  - John

I think the standards have pretty much ignored memory ordering. Trying
to read anything into the existing standards is a mistake. 

If we want to deal with memory ordering (and compiler reordering as
well), we need to define a consistent approach and work to have it
adopted as a standard. Currently, there are so many pitfalls and
undefined cases, it is amazing any code works at all.

Consider:
  o "volatile" still has no standard interpretation.
  O In practice, procedure calls act as compiler memory fences, not
hardware ones.
  O Inlining and global optimizations eliminate procedure
    calls, and may or may not eliminate compiler memory fences.

If we had to choose the "proper" semantics for mutex_lock and
mutex_unlock, I'd recommend we use acquire and release semantics. This
is an engineering decision - theory could go either way. The engineering
premise would be to give best performance to 99.9% of the cases used in
practice. Your case is an exception, and it would require an explicit
fence. If you need to squeeze out the last couple cycles, a sequence of
custom assembly language. If your case was more than a rare exception,
then we would need additional mutex procedures.

I don't know if I am a lone voice, but I wouldn't want to burden the
mainstream cases with unnecessary memory constraints.

Lacking any standards about ordering, I would proceed with caution. Each
platform is likely to be different.

  - John

> -----Original Message-----
> From: Boehm, Hans 
> Sent: Friday, May 06, 2005 1:26 PM
> To: 'cpp-threads at decadentplace.org.uk'
> Subject: Does pthread_mutex_lock() have release emantics?
> 
> [This follows an off-line discussion with Bill Pugh, Jeremy 
> Manson, and Sarita Adve.]
> 
> Here's a somewhat embarrassing issue that appears to have 
> been understood by a few people, including Sarita, but not by 
> me, nor probably by some of the people implementing thread support.
> It significantly affects the performance of pthreads on some 
> platforms, and possibly how we describe the memory model.
> 
> Consider the following code, which uses a truly weird 
> synchronization mechanism, in that the second thread waits 
> until the lock l is LOCKED, before it proceeds, effectively 
> reversing the sense of the lock.
> 
> Thread 1:
> 
> <Do some initialization stuff>
> pthread_mutex_lock(&l)
> <continue running, never releasing lock>
> 
> Thread 2:
> 
> while (pthread_mutex_trylock(&l)==0) unlock(&l); <continue 
> running, with knowledge that initialization by thread 1 was complete.>
> 
> For this to work, the pthread_mutex_lock() call must have 
> RELEASE semantics, instead of its usual acquire semantics.  (And
> pthread_mutex_trylock() must have acquire semantics when it FAILS.)
> 
> I think a reasonable reading of the pthreads standard would 
> pretty clearly conclude that this should work.  (Of course 
> nobody would recommend the programming style.)
> 
> Our present definition of data races implies that it would work.
> 
> Reasons it shouldn't work:
> 1) It seems to require lock operations to have acquire + 
> release semantics, which may mean barriers before and after 
> the change to the lock.  I would guess many current 
> implementations don't, at least on hardware on which CAS 
> doesn't include a full barrier (e.g. Itanium, PowerPC, 
> Alpha).  This may have a significant performance impact, and 
> no real benefit.
> 
> 2) It prevents the compiler from merging adjacent locked regions.
> 
> 3) I have no idea whether the pthread committee intended this 
> to work or whether it was an accident.  If anybody knows, 
> please tell me.
> 
> If we want to make it possible to prohibit this, I think we 
> need to move to a different, more complicated, definition of 
> a data race, which explicitly talks about a happens-before 
> relation, as in the Java case.  We probably need to do that 
> anyway if we want to precisely define the different ordering 
> constraints in the atomic operations library.  If we think 
> synchronization primitives should behave as though nothing 
> were reordered around them, then we might try to preserve the 
> current simpler definition for the core of the language, and 
> confine the added complexity to the atomic operations library.
> 
> Opinions?
> 
> Hans
>