[cpp-threads] RE: Does pthread_mutex_lock() have release emantics?
Morris, John V
john.v.morris at hp.com
Mon May 9 18:12:53 BST 2005
Hans,
Thanks for including me in your discussion. I'm going to toss in my
opinion. This is one of those topics where I could argue for either
side. I suppose my counter argument to my own message would be the mutex
routines have to *always* work, and making them depend on an esoteric
memory model is going to introduce bugs that will be extremely difficult
to track down and fix. Sigh.
- John
I think the standards have pretty much ignored memory ordering. Trying
to read anything into the existing standards is a mistake.
If we want to deal with memory ordering (and compiler reordering as
well), we need to define a consistent approach and work to have it
adopted as a standard. Currently, there are so many pitfalls and
undefined cases, it is amazing any code works at all.
Consider:
o "volatile" still has no standard interpretation.
O In practice, procedure calls act as compiler memory fences, not
hardware ones.
O Inlining and global optimizations eliminate procedure
calls, and may or may not eliminate compiler memory fences.
If we had to choose the "proper" semantics for mutex_lock and
mutex_unlock, I'd recommend we use acquire and release semantics. This
is an engineering decision - theory could go either way. The engineering
premise would be to give best performance to 99.9% of the cases used in
practice. Your case is an exception, and it would require an explicit
fence. If you need to squeeze out the last couple cycles, a sequence of
custom assembly language. If your case was more than a rare exception,
then we would need additional mutex procedures.
I don't know if I am a lone voice, but I wouldn't want to burden the
mainstream cases with unnecessary memory constraints.
Lacking any standards about ordering, I would proceed with caution. Each
platform is likely to be different.
- John
> -----Original Message-----
> From: Boehm, Hans
> Sent: Friday, May 06, 2005 1:26 PM
> To: 'cpp-threads at decadentplace.org.uk'
> Subject: Does pthread_mutex_lock() have release emantics?
>
> [This follows an off-line discussion with Bill Pugh, Jeremy
> Manson, and Sarita Adve.]
>
> Here's a somewhat embarrassing issue that appears to have
> been understood by a few people, including Sarita, but not by
> me, nor probably by some of the people implementing thread support.
> It significantly affects the performance of pthreads on some
> platforms, and possibly how we describe the memory model.
>
> Consider the following code, which uses a truly weird
> synchronization mechanism, in that the second thread waits
> until the lock l is LOCKED, before it proceeds, effectively
> reversing the sense of the lock.
>
> Thread 1:
>
> <Do some initialization stuff>
> pthread_mutex_lock(&l)
> <continue running, never releasing lock>
>
> Thread 2:
>
> while (pthread_mutex_trylock(&l)==0) unlock(&l); <continue
> running, with knowledge that initialization by thread 1 was complete.>
>
> For this to work, the pthread_mutex_lock() call must have
> RELEASE semantics, instead of its usual acquire semantics. (And
> pthread_mutex_trylock() must have acquire semantics when it FAILS.)
>
> I think a reasonable reading of the pthreads standard would
> pretty clearly conclude that this should work. (Of course
> nobody would recommend the programming style.)
>
> Our present definition of data races implies that it would work.
>
> Reasons it shouldn't work:
> 1) It seems to require lock operations to have acquire +
> release semantics, which may mean barriers before and after
> the change to the lock. I would guess many current
> implementations don't, at least on hardware on which CAS
> doesn't include a full barrier (e.g. Itanium, PowerPC,
> Alpha). This may have a significant performance impact, and
> no real benefit.
>
> 2) It prevents the compiler from merging adjacent locked regions.
>
> 3) I have no idea whether the pthread committee intended this
> to work or whether it was an accident. If anybody knows,
> please tell me.
>
> If we want to make it possible to prohibit this, I think we
> need to move to a different, more complicated, definition of
> a data race, which explicitly talks about a happens-before
> relation, as in the Java case. We probably need to do that
> anyway if we want to precisely define the different ordering
> constraints in the atomic operations library. If we think
> synchronization primitives should behave as though nothing
> were reordered around them, then we might try to preserve the
> current simpler definition for the core of the language, and
> confine the added complexity to the atomic operations library.
>
> Opinions?
>
> Hans
>
More information about the cpp-threads
mailing list