[cpp-threads] Yet another visibility question
Paul E. McKenney
paulmck at linux.vnet.ibm.com
Thu Jan 11 17:17:30 GMT 2007
On Thu, Jan 11, 2007 at 05:49:57PM +0100, Alexander Terekhov wrote:
> On 1/11/07, Peter Dimov <pdimov at mmltd.net> wrote:
> >Alexander Terekhov wrote:
> >
> >[...]
> >
> >> can be defined as: No store is visible to any other processor before
> >> the execution point of the store. Based on our discussion with Intel
> >> microarchitects we determined that all IA-32 and current generations
> >> of Itanium microprocessors support this due to identifiable and
> >> atomic global observation points for any store. This is mostly due to
> >> the shared bus and single chipset."
> >
> >It's my understanding that AMD Opterons have a separate memory controller
> >per socket; no shared bus and no single chipset (north bridge). They are
> >also NUMA since some parts of the memory are attached to this CPU and some
> >require CPU to CPU communication.
>
> Yeah, and IBM does some NUMA with Intel Xeons, IIRC. That's why I've
> been telling all along that lock_cmpxchg(&var, 42, 42) for loads is
> the way to do SC on x86/IA32. ;-)
Your advice is certainly consistent with the AMD x86-64
Architecture Programmer's Manual Volume 2 (System Programming),
24593-Rev.3.07-Sep-2002, page 195, first bullet:
Out-of-order reads are allowed. Out-of-order reads can occur
as a result of out-of-order execution or speculative execution.
The processor can read memory out-of-order to allow out-of-order
execution to proceed.
Although I suspect that your really need to do the following to get
a guaranteed ordered load of a shared variable x on x86 (though not
necessarily SC from what I understand):
do {
l = x;
} while (cmpxchg(&x, l, l) != l);
Thanx, Paul
More information about the cpp-threads
mailing list