[cpp-threads] RE: Does pthread_mutex_lock() have release emantics?

Fri May 6 23:50:29 BST 2005

[Dave - if you would like to be added to the C++ threads/memory model
list, I'm sure that wouldn't be a problem, and I think it would
help us.]

> -----Original Message-----
> From: Butenhof, David (iCAP/PPU) 
> 
> Boehm, Hans wrote:
> 
> >Consider the following code, which uses a truly weird 
> synchronization 
> >mechanism, in that the second thread waits until the lock l 
> is LOCKED, 
> >before it proceeds, effectively reversing the sense of the lock.
> >
> >Thread 1:
> >
> ><Do some initialization stuff>
> >pthread_mutex_lock(&l)
> ><continue running, never releasing lock>
> >
> >Thread 2:
> >
> >while (pthread_mutex_trylock(&l)==0) unlock(&l);
> ><continue running, with knowledge that initialization by 
> thread 1 was 
> >complete.>
> >  
> >
> Did you mean pthread_mutex_unlock()? ...
No. I really meant pthread_mutex_lock(&l), where l is a brand
new pthread_mutex_t. 

> >For this to work, the pthread_mutex_lock() call must have RELEASE 
> >semantics, instead of its usual acquire semantics.  (And
> >pthread_mutex_trylock() must have acquire semantics when
> >it FAILS.)
> >  
> >
> Also, for what it's worth, POSIX specifically denies any guarantee of 
> memory synchronization when a function FAILS.
> 
Alexander Terekhov also pointed that out.  I agree.

But I'm not convinced it's a fundamental issue.  Add an otherwise
irrelevant call that "synchronizes memory" after the trylock in
thread 2.  This code still breaks if the initialization code
in thread 1 is executed after the pthread_mutex_lock().  Hence
pthread_mutex_lock still needs release semantics in addition to
its usual acquire semantics.
> 
> >I think a reasonable reading of the pthreads standard would pretty 
> >clearly conclude that this should work.  (Of course nobody would 
> >recommend the programming style.)
> >  
> >
> Aside from the ownership violation, yes, it would have been 
> legal. POSIX 
> memory synchronization is phrased very simply -- while nobody who 
> understood the issues was entirely happy with that, it was 
> sufficient, 
> and simple enough to explain to most anyone. We have a much wider 
> audience of far more sophisticated programmers now; most everyone 
> understands at least the basics of concurrency and many have a 
> reasonable grasp of the machine-level basis.
> 
> So section 4.l0 of the UNIX spec says simply that "The following 
> functions synchronize memory with respect to other threads:", not "a 
> pthread_mutex_unlock in thread A makes previous writes from thread A 
> visible to a thread B subsequently completing a pthread_mutex_lock 
> operation", etc. As you infer, the standard clearly and unambiguously 
> requires acquire/release for every one of the operations on which you 
> can portably rely for memory synchronization. Unfortunate... 
> but, again, 
> simple, and sufficient. (At least for coarse grain concurrency.)
My mistaken impression was that there is no observable difference
between this and the implementation in which lock has acquire semantics
and unlock has release semantics.  I stil believe that's true in the
absence of trylock.  I think the (patched version of the) above
example is a counterexample with trylock. 
> 
> Acquire and release semantics were embedded in the concurrency 
> predicates originally developed by Leslie Lamport and Garret 
> Swart for 
> an early draft. It looked far too mathematical to most of the 
> balloting 
> group, few people could wrap their heads around the notation 
> enough to 
> feel comfortable that they understood the implications... and 
> we came to 
> realize it would never pass. (A year or so ago I searched my 
> copies of 
> the early drafts, but found only a somewhat watered down version that 
> was just too hacky to post. It's possible that the original proposal 
> never even made a circulated draft; although Lamport did publish a 
> similar concurrency notation for Taos threads in one of his DEC SRC 
> papers; and someday maybe I'll find time to search for it.)
> 
> I'd like to think that the POSIX audience is now ready for "the next 
> level", but it's going to take time and patience to follow through. I 
> just haven't had either. Alexander Terekhov showed signs at 
> one point of 
> beginning a serious campaign on this issue, but I haven't 
> heard anything 
> in quite a while.
He has been actively involved here.  I'm most interested in refining
the C++ specification sufficiently that it becomes possible to 
specify a threads API (be it pthreads or some higher-level layer in
the C++ library) without the current issues about language semantics
and allowed compiler optimizations.  Others are more interested in
hopefully complementary efforts to specify atomic operations in C++,
and in specifying the higher level threads API.
> 
> >Our present definition of data races implies that it would work.
> >
> >Reasons it shouldn't work:
> >1) It seems to require lock operations to have acquire + release 
> >semantics, which may mean barriers before and after the 
> change to the 
> >lock.  I would guess many current implementations don't, at least on 
> >hardware on which CAS doesn't include a full barrier (e.g. Itanium, 
> >PowerPC, Alpha).  This may have a significant performance 
> impact, and 
> >no real benefit.
> >  
> >
> Well, the benefit, in the standard, was simplicity. And 
> implementations 
> need to weigh the benefits of improving performance for 
> applications vs 
> the risk of breaking some (at least nominally) conforming 
> applications. 
> Once upon a time I was really bothered by things like that; but I've 
> mellowed. We don't live in a perfect world, and there are far more 
> important things to get riled about. ;-)
I think both are appreciated.

> ...
> >2) It prevents the compiler from merging adjacent locked regions.
> >
> >3) I have no idea whether the pthread committee intended 
> this to work 
> >or whether it was an accident.  If anybody knows, please tell me.
> >  
> >
> As I said, technically this is a rogue-o-matic. But there are less 
> radical constructs that really "oughtn't to work" in a 
> perfect acq/rel 
> world, but are required by POSIX to work. "We" (the 
> "threadies" in the 
> original working group) wanted a better world. The original CMA 
> specification from which POSIX threads were loosely adapted specified 
> acquire/release semantics in terms of both object identity and 
> operation... that is, memory visibility flowed explicitly from the 
> thread releasing mutex A to the subsequent thread acquiring 
> mutex A, and 
> nowhere else. (Even though we knew of no memory architecture 
> that could 
> support that degree of separation, it would provide the most 
> flexibility 
> for optimization while giving enough guarantee for a correct 
> and careful 
> program.)
> 
> But alas, even as prose, this was too complicated for the standards 
> environment at that time.
> 
> >If we want to make it possible to prohibit this, I think we need to 
> >move to a different, more complicated, definition of a data 
> race, which 
> >explicitly talks about a happens-before relation, as in the 
> Java case.  
> >We probably need to do that anyway if we want to precisely 
> define the 
> >different ordering constraints in the atomic operations 
> library.  If we 
> >think synchronization primitives should behave as though 
> nothing were 
> >reordered around them, then we might try to preserve the current 
> >simpler definition for the core of the language, and confine 
> the added 
> >complexity to the atomic operations library.
> >
> >Opinions?
> >  
> >
> Yes, "happens-before" type relationships are critical for a 
> well-defined 
> and useful synchronization protocol. Like I said, Alexander 
> had made a 
> start at a proposal for this. I think his initial proposal 
> had a lot of 
> problems, but the basic direction was a good start.
Interestingly, I think in the absence of trylock and atomic operations
I think that isn't clear.  I think in that restricted environment,
a refined version of the Posix formulation is probably equivalent.
But trylock is there, and that's probably where we'll end up.
I'd still like to convince myself that the added complexity is
necessary, though.

> 
> C++ can start from scratch and "get it right", but you need 
> to keep your
> eyes open to the real-world constraints on implementations and 
> applications. A C++ application hosted on a conforming POSIX 
> application 
> may be over-synchronized, but should work, and it would 
> provide a lot of 
> incentive to fix POSIX.
Agreed.  I would certainly like to see us avoid breaking pthreads
applications.  I'm not sure whether breaking code like the above
example is a problem though.
> 
> Ideally, the compiler should understand the concept of shared 
> data and 
> predicates, and allow free reordering of operations on which 
> no shared 
> data depends -- and maybe even in cases where it does but no exposed 
> predicate can be violated. (Though I have doubts any modular compiler 
> system could realistically exploit that knowledge, that's no 
> reason to 
> disallow it.)
> 
That's certainly a goal.  As a practical matter there are
some common optimizations that we need to disallow, some of
which currently produce surprises.  But I think the effect is
small.