[cpp-threads] Re: Does pthread_mutex_lock() have release emantics?

Fri May 6 22:41:59 BST 2005

Boehm, Hans wrote:

>[This follows an off-line discussion with Bill Pugh, Jeremy Manson,
>and Sarita Adve.]
>
>Here's a somewhat embarrassing issue that appears to have been
>understood by a few people, including Sarita, but not by me,
>nor probably by some of the people implementing thread support.
>It significantly affects the performance of pthreads on some
>platforms, and possibly how we describe the memory model.
>
>Consider the following code, which uses a truly weird synchronization
>mechanism, in that the second thread waits until the lock l is
>LOCKED, before it proceeds, effectively reversing the sense of
>the lock.
>
>Thread 1:
>
><Do some initialization stuff>
>pthread_mutex_lock(&l)
><continue running, never releasing lock>
>
>Thread 2:
>
>while (pthread_mutex_trylock(&l)==0) unlock(&l);
><continue running, with knowledge that initialization by
>thread 1 was complete.>
>  
>
Did you mean pthread_mutex_unlock()? That's certainly implied, as you 
used "l", which is presumably a pthread_mutex_t; but I'm suspicious that 
you use full POSIX function names elsewhere and a generic unlock() of 
unspecified semantics...

If this is intended to be pthread_mutex_unlock(), then the application 
is erroneous; it violates the basic POSIX principle that mutexes are 
OWNED by the locking thread, and only it can unlock them. The semantics 
of this program are therefore completely undefined. As one member of the 
working group likes to say, it fails the rogue-o-matic test. That is, a 
strictly conforming POSIX implementation, confronted with this program, 
might legally take up an unrelated game and refuse further input. More 
pragmatically, such a program has absolutely no guaranteed memory behavior.

While the EPERM error on pthread_mutex_unlock() need not be detected or 
reported by conforming implementations, the requirements imposed on the 
application are not relaxed merely because the implementation chooses to 
save a few cycles by ignoring it.

Of course you could replace the mutex with a binary semaphore, and the 
program would then be completely legal.

>For this to work, the pthread_mutex_lock() call must have RELEASE
>semantics, instead of its usual acquire semantics.  (And
>pthread_mutex_trylock() must have acquire semantics when
>it FAILS.)
>  
>
Also, for what it's worth, POSIX specifically denies any guarantee of 
memory synchronization when a function FAILS.

So on either branch, this code loses on a simple conformance test.

>I think a reasonable reading of the pthreads standard would pretty
>clearly conclude that this should work.  (Of course nobody would
>recommend the programming style.)
>  
>
Aside from the ownership violation, yes, it would have been legal. POSIX 
memory synchronization is phrased very simply -- while nobody who 
understood the issues was entirely happy with that, it was sufficient, 
and simple enough to explain to most anyone. We have a much wider 
audience of far more sophisticated programmers now; most everyone 
understands at least the basics of concurrency and many have a 
reasonable grasp of the machine-level basis.

So section 4.l0 of the UNIX spec says simply that "The following 
functions synchronize memory with respect to other threads:", not "a 
pthread_mutex_unlock in thread A makes previous writes from thread A 
visible to a thread B subsequently completing a pthread_mutex_lock 
operation", etc. As you infer, the standard clearly and unambiguously 
requires acquire/release for every one of the operations on which you 
can portably rely for memory synchronization. Unfortunate... but, again, 
simple, and sufficient. (At least for coarse grain concurrency.)

Acquire and release semantics were embedded in the concurrency 
predicates originally developed by Leslie Lamport and Garret Swart for 
an early draft. It looked far too mathematical to most of the balloting 
group, few people could wrap their heads around the notation enough to 
feel comfortable that they understood the implications... and we came to 
realize it would never pass. (A year or so ago I searched my copies of 
the early drafts, but found only a somewhat watered down version that 
was just too hacky to post. It's possible that the original proposal 
never even made a circulated draft; although Lamport did publish a 
similar concurrency notation for Taos threads in one of his DEC SRC 
papers; and someday maybe I'll find time to search for it.)

I'd like to think that the POSIX audience is now ready for "the next 
level", but it's going to take time and patience to follow through. I 
just haven't had either. Alexander Terekhov showed signs at one point of 
beginning a serious campaign on this issue, but I haven't heard anything 
in quite a while.

>Our present definition of data races implies that it would work.
>
>Reasons it shouldn't work:
>1) It seems to require lock operations to have acquire +
>release semantics, which may mean barriers before and after the
>change to the lock.  I would guess many current implementations don't,
>at least on hardware on which CAS doesn't include a full barrier
>(e.g. Itanium, PowerPC, Alpha).  This may have a significant
>performance impact, and no real benefit.
>  
>
Well, the benefit, in the standard, was simplicity. And implementations 
need to weigh the benefits of improving performance for applications vs 
the risk of breaking some (at least nominally) conforming applications. 
Once upon a time I was really bothered by things like that; but I've 
mellowed. We don't live in a perfect world, and there are far more 
important things to get riled about. ;-)

An awful lot of mutex operations are of the order "lock, add one, 
unlock", or "lock, set bit, unlock". Those absolutely can and should be 
done with a simple "CAS" sequence (e.g., Alpha LDx_L/STx_C, etc.), with 
no acquire, release, barrier, or anything else. Unfortunately, Alpha 
required a full barrier in nearly all cases if the counter or bit is a 
signal (in practice write barriers are of minimal use). At least Itanium 
got ONE thing right, interlocked ops with rel/acq modifiers.

>2) It prevents the compiler from merging adjacent locked regions.
>
>3) I have no idea whether the pthread committee intended this to
>work or whether it was an accident.  If anybody knows, please
>tell me.
>  
>
As I said, technically this is a rogue-o-matic. But there are less 
radical constructs that really "oughtn't to work" in a perfect acq/rel 
world, but are required by POSIX to work. "We" (the "threadies" in the 
original working group) wanted a better world. The original CMA 
specification from which POSIX threads were loosely adapted specified 
acquire/release semantics in terms of both object identity and 
operation... that is, memory visibility flowed explicitly from the 
thread releasing mutex A to the subsequent thread acquiring mutex A, and 
nowhere else. (Even though we knew of no memory architecture that could 
support that degree of separation, it would provide the most flexibility 
for optimization while giving enough guarantee for a correct and careful 
program.)

But alas, even as prose, this was too complicated for the standards 
environment at that time.

>If we want to make it possible to prohibit this, I think we need
>to move to a different, more complicated, definition of a data
>race, which explicitly talks about a happens-before relation,
>as in the Java case.  We probably need to do that anyway if we want
>to precisely define the different ordering constraints in the
>atomic operations library.  If we think synchronization primitives
>should behave as though nothing were reordered around them, then we
>might try to preserve the current simpler definition for the
>core of the language, and confine the added complexity to the
>atomic operations library.
>
>Opinions?
>  
>
Yes, "happens-before" type relationships are critical for a well-defined 
and useful synchronization protocol. Like I said, Alexander had made a 
start at a proposal for this. I think his initial proposal had a lot of 
problems, but the basic direction was a good start.

C++ can start from scratch and "get it right", but you need to keep your 
eyes open to the real-world constraints on implementations and 
applications. A C++ application hosted on a conforming POSIX application 
may be over-synchronized, but should work, and it would provide a lot of 
incentive to fix POSIX.

Ideally, the compiler should understand the concept of shared data and 
predicates, and allow free reordering of operations on which no shared 
data depends -- and maybe even in cases where it does but no exposed 
predicate can be violated. (Though I have doubts any modular compiler 
system could realistically exploit that knowledge, that's no reason to 
disallow it.)
-------------- next part --------------
A non-text attachment was scrubbed...
Name: david.butenhof.vcf
Type: text/x-vcard
Size: 476 bytes
Desc: not available
Url : http://shadbolt.decadentplace.org.uk/pipermail/cpp-threads/attachments/20050506/b9a602a7/david.butenhof.vcf