[cpp-threads] Failed compare-and-swap

Tue Jul 31 20:07:38 BST 2007

On Tue, Jul 31, 2007 at 06:31:20PM -0000, Boehm, Hans wrote:
> [Catching up with this thread ...] 
> 
> > -----Original Message-----
> > From: cpp-threads-bounces at decadentplace.org.uk 
> > [mailto:cpp-threads-bounces at decadentplace.org.uk] On Behalf 
> > Of Herb Sutter
> > Sent: Tuesday, July 31, 2007 10:45 AM
> > To: Chris Thomasson; C++ threads standardisation
> > Subject: Re: [cpp-threads] Failed compare-and-swap
> > 
> > Summarizing:
> > 
> > The point well illustrated by these examples is that if CAS 
> > returns the old value (rather than a bool) it can be used to 
> > perform an atomic load. If we allow that, then it should have 
> > the same default acquire semantics as a load (unless 
> > overridden in the weak/relaxed atomics of course).
> > 
> > Lawrence and everyone, what do you think? Now that I see 
> > this, it seems obvious in retrospect, and this question must 
> > have come up before with CAS semantics. What do other systems do?
>
> I completely agree with this.  If a CAS can be used to read the old
> value (which it can, arguably if it only has a boolean result), and it
> has at least acquire semantics, then it should still have acquire
> semantics even if it fails and the write doesn't take place. 
> The read still succeeds.

As long as the specified memory order includes acquire semantics.

Taking them in turn for the CAS failure case:

o	seq_cst (or "sequential", as suggested by Core): clearly
	needs to be ordered.

o	acquire: needs acquire semantics.

o	acq_rel: needs acquire semantics.

o	relaxed: relaxed semantics, no memory fences needed.

o	release: seems like no memory fences needed in failure case, at
	least at first glance.	If so, this might be what people who
	want cheap CAS failure should use.

> Whether or not it has release semantics is not observable by our
> definitions, since there is no write for another thread to see.  Thus I
> don't think there is any reason to say anything special about that
> either.
> 
> We clearly need the acquire property for at least seq_cst, since we
> don't otherwise get sequential consistency for race-free programs.  We
> could conceivably specify something else for acquire and acq_rel.  But
> that would clearly complicate things, and I tend to agree with Paul that
> it's probably not necessary.  If this seriously affects the performance
> of your code, you have bigger fish to fry.

I believe your position is consistent with my list above.  However, given
the fact that trylock acts differently, it might be worth a sentence
that explicitly states that the relevant memory-ordering properties
apply even in the failure case.

> Trylock is a different situation, since a failed trylock by definition
> reads a value written by an operation (lock acquisition)without release
> semantics.  Thus it currently never establishes a synchronizes with
> relationship, and hence has no impact on visibility.  To meaningfully
> change that, lock acquisition would need release semantics, as it
> technically does in Posix.  (As Alexander points out, a simple trylock
> can't tell the difference in Posix.  On the other hand, a trylock
> followed by an unrelated successful synchronization operation can.)
> We've brought this up repeatedly, and I believe there is little support
> for release semantics for lock acquisition, not even from the Posix
> side.  (I believe a lot of Posix implementations don't follow this
> interpretation of the rules as it stands.)

Yep, changing the memory semantics of trylock failure would not be
a short-term project...

						Thanx, Paul

> Hans
> 
> > 
> > 
> > Details follow:
> > 
> > 
> > Chris wrote:
> > > From: "Herb Sutter" <hsutter at microsoft.com>
> > > > Besides using Interlocked* as a way to spell a full barrier, what 
> > > > cases do you know of where people look at the return value? I 
> > > > haven't looked at this issue yet, and I would be interested in 
> > > > understand such cases better.
> > >
> > > Here are two examples:
> > >
> > > 
> > http://groups.google.com/group/comp.programming.threads/msg/b6a4eec6cb
> > > b
> > > a625b (contrived example)
> > 
> > For convenience, here's the example (let's call it Example 
> > 1), which basically uses a CAS to perform an atomic read:
> > 
> >   // Example 1
> > 
> >   static CMyData my_data;
> >   static volatile LONG flag = 0;
> > 
> >   Thread 1:
> >     my_data.setup();
> >     InterlockedCompareExchange( &flag, 1, 0 ); // 0 -> 1
> > 
> >   Thread 2:
> >     while ( ! InterlockedCompareExchange( &flag, 0, 0 ) ) {
> >       Sleep( 0 ); _asm pause;
> >     }
> >     // flag is set
> >     my_data.use();
> > 
> > Yes, this relies on a failed CAS being a barrier, though this 
> > particular example needs it only to be an acquire barrier. 
> > The general technique here is to use a CAS purely as an atomic read.
> > 
> > So the question here is whether we want to support the 
> > technique of using a CAS to perform an atomic read. Do we? We 
> > can say yes by returning the old value, or no by returning a 
> > succeeded/failed bool.
> > 
> > 
> > > 
> > http://groups.google.com/group/comp.programming.threads/msg/57a691b215
> > > 9
> > > fa698
> > 
> > OK, this is a followup where Joe Seigh points out what I said 
> > above that you could just use a plain atomic read:
> > 
> >   // Example 2
> > 
> >   ... as in Example 1, except:
> > 
> >   Thread 2:
> >     while (flag != 1)
> >       sleep(0);
> >     InterlockedCompareExchange(&flag, 1, 1);
> >     // flag is set
> >     my_data.use();
> > 
> > I think Joe adds the ICE call only because it seems this 
> > thread predates VS 2005 where we added more ordering 
> > semantics to volatile. In VS 2005, I you should be able to just do:
> > 
> >   // Example 3
> > 
> >   ... as in Example 1, except:
> > 
> >   Thread 2:
> >     while (flag != 1)
> >       sleep(0);
> >     // flag is set
> >     my_data.use();
> > 
> > and have that work reliably.
> > 
> > 
> > > 
> > http://groups.google.com/group/comp.programming.threads/msg/31803c3398
> > > 6
> > > 58e06
> > > (less contrived example; examine the 'initFreeQueue' function)
> > >
> > > I think the reasoning is that the InterlockedCompareExchangePointer 
> > > does return real data.
> > 
> > Thanks, I didn't have time to read it carefully. So it seems 
> > to me that the point is that if you can use CAS to do a read, 
> > it should be ordered like a read (by default).
> > 
> > Herb
> > 
> > 
> > 
> > --
> > cpp-threads mailing list
> > cpp-threads at decadentplace.org.uk
> > http://www.decadentplace.org.uk/cgi-bin/mailman/listinfo/cpp-threads
> > 
> 
> -- 
> cpp-threads mailing list
> cpp-threads at decadentplace.org.uk
> http://www.decadentplace.org.uk/cgi-bin/mailman/listinfo/cpp-threads