[cpp-threads] D2335 (sequential consistency proof) revision

Sat Aug 25 00:17:48 BST 2007

----- Original Message ----- 
From: "Boehm, Hans" <hans.boehm at hp.com>
To: "Chris Thomasson" <cristom at comcast.net>; "C++ threads standardisation" 
<cpp-threads at decadentplace.org.uk>
Cc: "Howard Hinnant" <howard.hinnant at gmail.com>
Sent: Friday, August 24, 2007 3:37 PM
Subject: RE: [cpp-threads] D2335 (sequential consistency proof) revision

> From: Chris Thomasson
>
> [...]
>
> Taking a lock does not necessarily require a #StoreLoad | #StoreStore
> barrier. There are asymmetric locking algorithms in which
> certain thread(s)
> can take the lock without using interlocked rmw or store/load
> memory barrier
> instructions.
>

> You're presumably talking about cases in which a thread knows that it
> held the lock last?

I am referring to a type of "biased" locking scheme that can designate a 
thread(s) which in turn can take locks without using interlocked rmw or 
membars. Here is a basic example:

http://blogs.sun.com/dave/resource/Asymmetric-Dekker-Synchronization.txt
(search webpage text for 'The Key Innovation'...)

http://www.trl.ibm.com/people/kawatiya/Kawachiya05phd.pdf
(chapter 6)

Foreign threads must wait until the biased thread serializes itself by 
executing something that is analogous to a release membar. The membar does 
not necessarily have to be in the locking functions themselves. You can even 
do a asymmetric read-write lock that has no interlocked rmw or membars on 
the read-lock side in the fast-path case. It kind of resembles epoch 
synchronization in which per-thread/cpu synchronizations are tracked at a 
wider scale than usual:

http://appcore.home.comcast.net/vzoom/round-2.pdf
(page 2/section 1.2/second paragraph...)

I say wider scale because most synchronization algorithms expect explicit 
serializations to occur on a per-operation basis. For instance, a reference 
counting algorithm could rely on each reference increment and decrement 
operation being guarded with the appropriate membars. Or, a locking 
algorithm might rely on each acquisition and subsequent release will be 
protected by the correct barriers... These types of strict rules can be 
loosed a great deal by tracking the membars on a wider-scale. Basically, the 
membars can be amortized. They do not necessarily _have_ to be invoked on a 
per-operation basis.

I am not sure that this type of "weird" synchronization algorithms would 
warrant a modification to the standardized mutex api... Humm. Need to think 
some more on this. Anyway, I am sorry for the very brief information, but, 
did it help clear anything up at all?

:^)