[cpp-threads] Yet another visibility question

Thu Jan 11 17:17:30 GMT 2007

On Thu, Jan 11, 2007 at 05:49:57PM +0100, Alexander Terekhov wrote:
> On 1/11/07, Peter Dimov <pdimov at mmltd.net> wrote:
> >Alexander Terekhov wrote:
> >
> >[...]
> >
> >> can be defined as: No store is visible to any other processor before
> >> the execution point of the store. Based on our discussion with Intel
> >> microarchitects we determined that all IA-32 and current generations
> >> of Itanium microprocessors support this due to identifiable and
> >> atomic global observation points for any store. This is mostly due to
> >> the shared bus and single chipset."
> >
> >It's my understanding that AMD Opterons have a separate memory controller
> >per socket; no shared bus and no single chipset (north bridge). They are
> >also NUMA since some parts of the memory are attached to this CPU and some
> >require CPU to CPU communication.
> 
> Yeah, and IBM does some NUMA with Intel Xeons, IIRC. That's why I've
> been telling all along that lock_cmpxchg(&var, 42, 42) for loads is
> the way to do SC on x86/IA32. ;-)

Your advice is certainly consistent with the AMD x86-64
Architecture Programmer's Manual Volume 2 (System Programming),
24593-Rev.3.07-Sep-2002, page 195, first bullet:

	Out-of-order reads are allowed.  Out-of-order reads can occur
	as a result of out-of-order execution or speculative execution.
	The processor can read memory out-of-order to allow out-of-order
	execution to proceed.

Although I suspect that your really need to do the following to get
a guaranteed ordered load of a shared variable x on x86 (though not
necessarily SC from what I understand):

	do {
		l = x;
	} while (cmpxchg(&x, l, l) != l);

							Thanx, Paul