[cpp-threads] Yet another visibility question

Paul E. McKenney paulmck at linux.vnet.ibm.com
Sat Jan 13 00:30:30 GMT 2007


On Fri, Jan 12, 2007 at 11:26:51PM +0100, Alexander Terekhov wrote:
> On 1/12/07, Alexander Terekhov <alexander.terekhov at gmail.com> wrote:
> >On 1/12/07, Paul E. McKenney <paulmck at linux.vnet.ibm.com> wrote:
> >> On Fri, Jan 12, 2007 at 09:03:50PM +0100, Alexander Terekhov wrote:
> >> > On 1/12/07, Paul E. McKenney <paulmck at linux.vnet.ibm.com> wrote:
> >> > [...]
> >> > >> Think of implementing SC  with per-variable locks on a large 
> >x86/IA32
> 
> Think of implementing SC for atomic<> (atomic_blah_blah in C speak)
> with per-variable locks on a large x86/IA32

OK...  So you are leveraging the cache coherence properties of writes to
single variables (in this case each per-variable lock) in force global
ordering -- in particular, by the fact that you acquire the per-variable
lock around each load from the protected variable as well as around each
store to that variable.  I am still not fully convinced that this works in
all cases on all architectures, but I -do- know that this approach will
incur a multiple-order-of-magnitude performance penalty for situations
where the variables are loaded from much more frequently than they are
stored to.  The reason for this is that you have essentially disabled the
read-shared capability of the caches -- a read-intensive workload, where
multiple CPUs are reading a given variable, will take a communication
cache miss on each and every load from that variable, even if it has
not been updated since the last time it was loaded.

So, even if this does work, it does not seem particularly practical, as
the performance hit would force programmer to do things the bad old way.

Or am I still missing something here?

							Thanx, Paul

> >> > >> NUMA configuration with multiple FSBs or whatever. It won't be any
> >> > >> better than cmpxchg for loads and xchg for stores (both locked).
> >> > >
> >> > >My understanding is that locking is only guaranteed to force SC 
> >execution
> >> > >of critical sections when all of the critical sections of interest
> >> > >are guarded by the same lock.  Therefore, per-variable locks will not
> >> > >necessarily force SC execution of all accesses.
> >> >
> >> > Give an example (for x86/IA32).
> >>
> >> I will start with a blatant example to make sure that we are really
> >> talking about the same thing.  So, for a starting point using 
> >Linux-kernel
> >> notation:
> >>
> >> o       Shared data:
> >>
> >>        DEFINE_MUTEX(mutex1);
> >>        DEFINE_MUTEX(mutex2);
> >>        int a = 0, b = 0, c = 0;
> >>
> >> o       Thread 1:
> >>
> >>        mutex_lock(&mutex1);
> >>        a = 1;
> >>        b = 1;
> >>        mutex_unlock(&mutex1);
> >>
> >> o       Thread 2:
> >>
> >>        mutex_lock(&mutex2);
> >>        c = 1;
> >>        mutex_unlock(&mutex2);
> >>
> >> The two critical sections do not exclude each other, so could overlap
> >> arbitrarily.  In this example, you would actually be worse off with
> >> locks than with per-variable cmpxchg.
> >>
> >> Or am I missing your point?
> >
> >Give an example with assert().
> >
> 
> regards.
> alexander.
> 
> --
> cpp-threads mailing list
> cpp-threads at decadentplace.org.uk
> http://www.decadentplace.org.uk/cgi-bin/mailman/listinfo/cpp-threads



More information about the cpp-threads mailing list