[cpp-threads] Failed compare-and-swap
Paul E. McKenney
paulmck at linux.vnet.ibm.com
Tue Jul 31 20:07:38 BST 2007
On Tue, Jul 31, 2007 at 06:31:20PM -0000, Boehm, Hans wrote:
> [Catching up with this thread ...]
>
> > -----Original Message-----
> > From: cpp-threads-bounces at decadentplace.org.uk
> > [mailto:cpp-threads-bounces at decadentplace.org.uk] On Behalf
> > Of Herb Sutter
> > Sent: Tuesday, July 31, 2007 10:45 AM
> > To: Chris Thomasson; C++ threads standardisation
> > Subject: Re: [cpp-threads] Failed compare-and-swap
> >
> > Summarizing:
> >
> > The point well illustrated by these examples is that if CAS
> > returns the old value (rather than a bool) it can be used to
> > perform an atomic load. If we allow that, then it should have
> > the same default acquire semantics as a load (unless
> > overridden in the weak/relaxed atomics of course).
> >
> > Lawrence and everyone, what do you think? Now that I see
> > this, it seems obvious in retrospect, and this question must
> > have come up before with CAS semantics. What do other systems do?
>
> I completely agree with this. If a CAS can be used to read the old
> value (which it can, arguably if it only has a boolean result), and it
> has at least acquire semantics, then it should still have acquire
> semantics even if it fails and the write doesn't take place.
> The read still succeeds.
As long as the specified memory order includes acquire semantics.
Taking them in turn for the CAS failure case:
o seq_cst (or "sequential", as suggested by Core): clearly
needs to be ordered.
o acquire: needs acquire semantics.
o acq_rel: needs acquire semantics.
o relaxed: relaxed semantics, no memory fences needed.
o release: seems like no memory fences needed in failure case, at
least at first glance. If so, this might be what people who
want cheap CAS failure should use.
> Whether or not it has release semantics is not observable by our
> definitions, since there is no write for another thread to see. Thus I
> don't think there is any reason to say anything special about that
> either.
>
> We clearly need the acquire property for at least seq_cst, since we
> don't otherwise get sequential consistency for race-free programs. We
> could conceivably specify something else for acquire and acq_rel. But
> that would clearly complicate things, and I tend to agree with Paul that
> it's probably not necessary. If this seriously affects the performance
> of your code, you have bigger fish to fry.
I believe your position is consistent with my list above. However, given
the fact that trylock acts differently, it might be worth a sentence
that explicitly states that the relevant memory-ordering properties
apply even in the failure case.
> Trylock is a different situation, since a failed trylock by definition
> reads a value written by an operation (lock acquisition)without release
> semantics. Thus it currently never establishes a synchronizes with
> relationship, and hence has no impact on visibility. To meaningfully
> change that, lock acquisition would need release semantics, as it
> technically does in Posix. (As Alexander points out, a simple trylock
> can't tell the difference in Posix. On the other hand, a trylock
> followed by an unrelated successful synchronization operation can.)
> We've brought this up repeatedly, and I believe there is little support
> for release semantics for lock acquisition, not even from the Posix
> side. (I believe a lot of Posix implementations don't follow this
> interpretation of the rules as it stands.)
Yep, changing the memory semantics of trylock failure would not be
a short-term project...
Thanx, Paul
> Hans
>
> >
> >
> > Details follow:
> >
> >
> > Chris wrote:
> > > From: "Herb Sutter" <hsutter at microsoft.com>
> > > > Besides using Interlocked* as a way to spell a full barrier, what
> > > > cases do you know of where people look at the return value? I
> > > > haven't looked at this issue yet, and I would be interested in
> > > > understand such cases better.
> > >
> > > Here are two examples:
> > >
> > >
> > http://groups.google.com/group/comp.programming.threads/msg/b6a4eec6cb
> > > b
> > > a625b (contrived example)
> >
> > For convenience, here's the example (let's call it Example
> > 1), which basically uses a CAS to perform an atomic read:
> >
> > // Example 1
> >
> > static CMyData my_data;
> > static volatile LONG flag = 0;
> >
> > Thread 1:
> > my_data.setup();
> > InterlockedCompareExchange( &flag, 1, 0 ); // 0 -> 1
> >
> > Thread 2:
> > while ( ! InterlockedCompareExchange( &flag, 0, 0 ) ) {
> > Sleep( 0 ); _asm pause;
> > }
> > // flag is set
> > my_data.use();
> >
> > Yes, this relies on a failed CAS being a barrier, though this
> > particular example needs it only to be an acquire barrier.
> > The general technique here is to use a CAS purely as an atomic read.
> >
> > So the question here is whether we want to support the
> > technique of using a CAS to perform an atomic read. Do we? We
> > can say yes by returning the old value, or no by returning a
> > succeeded/failed bool.
> >
> >
> > >
> > http://groups.google.com/group/comp.programming.threads/msg/57a691b215
> > > 9
> > > fa698
> >
> > OK, this is a followup where Joe Seigh points out what I said
> > above that you could just use a plain atomic read:
> >
> > // Example 2
> >
> > ... as in Example 1, except:
> >
> > Thread 2:
> > while (flag != 1)
> > sleep(0);
> > InterlockedCompareExchange(&flag, 1, 1);
> > // flag is set
> > my_data.use();
> >
> > I think Joe adds the ICE call only because it seems this
> > thread predates VS 2005 where we added more ordering
> > semantics to volatile. In VS 2005, I you should be able to just do:
> >
> > // Example 3
> >
> > ... as in Example 1, except:
> >
> > Thread 2:
> > while (flag != 1)
> > sleep(0);
> > // flag is set
> > my_data.use();
> >
> > and have that work reliably.
> >
> >
> > >
> > http://groups.google.com/group/comp.programming.threads/msg/31803c3398
> > > 6
> > > 58e06
> > > (less contrived example; examine the 'initFreeQueue' function)
> > >
> > > I think the reasoning is that the InterlockedCompareExchangePointer
> > > does return real data.
> >
> > Thanks, I didn't have time to read it carefully. So it seems
> > to me that the point is that if you can use CAS to do a read,
> > it should be ordered like a read (by default).
> >
> > Herb
> >
> >
> >
> > --
> > cpp-threads mailing list
> > cpp-threads at decadentplace.org.uk
> > http://www.decadentplace.org.uk/cgi-bin/mailman/listinfo/cpp-threads
> >
>
> --
> cpp-threads mailing list
> cpp-threads at decadentplace.org.uk
> http://www.decadentplace.org.uk/cgi-bin/mailman/listinfo/cpp-threads
More information about the cpp-threads
mailing list