[cpp-threads] Does pthread_mutex_lock() have release emantics?

Fri May 6 21:25:58 BST 2005

[This follows an off-line discussion with Bill Pugh, Jeremy Manson,
and Sarita Adve.]

Here's a somewhat embarrassing issue that appears to have been
understood by a few people, including Sarita, but not by me,
nor probably by some of the people implementing thread support.
It significantly affects the performance of pthreads on some
platforms, and possibly how we describe the memory model.

Consider the following code, which uses a truly weird synchronization
mechanism, in that the second thread waits until the lock l is
LOCKED, before it proceeds, effectively reversing the sense of
the lock.

Thread 1:

<Do some initialization stuff>
pthread_mutex_lock(&l)
<continue running, never releasing lock>

Thread 2:

while (pthread_mutex_trylock(&l)==0) unlock(&l);
<continue running, with knowledge that initialization by
thread 1 was complete.>

For this to work, the pthread_mutex_lock() call must have RELEASE
semantics, instead of its usual acquire semantics.  (And
pthread_mutex_trylock() must have acquire semantics when
it FAILS.)

I think a reasonable reading of the pthreads standard would pretty
clearly conclude that this should work.  (Of course nobody would
recommend the programming style.)

Our present definition of data races implies that it would work.

Reasons it shouldn't work:
1) It seems to require lock operations to have acquire +
release semantics, which may mean barriers before and after the
change to the lock.  I would guess many current implementations don't,
at least on hardware on which CAS doesn't include a full barrier
(e.g. Itanium, PowerPC, Alpha).  This may have a significant
performance impact, and no real benefit.

2) It prevents the compiler from merging adjacent locked regions.

3) I have no idea whether the pthread committee intended this to
work or whether it was an accident.  If anybody knows, please
tell me.

If we want to make it possible to prohibit this, I think we need
to move to a different, more complicated, definition of a data
race, which explicitly talks about a happens-before relation,
as in the Java case.  We probably need to do that anyway if we want
to precisely define the different ordering constraints in the
atomic operations library.  If we think synchronization primitives
should behave as though nothing were reordered around them, then we
might try to preserve the current simpler definition for the
core of the language, and confine the added complexity to the
atomic operations library.

Opinions?

Hans