[cpp-threads] Re: Increment/decrement operators on atomics package

Sat Apr 28 00:05:37 BST 2007

> From:  Raul Silvera
> 
> 
> Particularly for PPC, SC only requires fences between each 
> pair of memory accesses, while RMW operations require a 
> load-reserve/store-conditional loop.
> 
My initial mental model was that the cost of RMW operations, even on
PowerPC, stems almost exclusively from the fences.  I think I was
corrected at some point, and it was pointed out that this isn't entirely
true.  But it still seems to me that load-reserve/store-conditional
should inherently be cheap for the hardware; it's basically just setting
a processor register on the load-reserve, and checking it on the
store-conditional, right?

Is my model wrong, so that the load-reserve/store-conditional actually
account for much of the cost?  If so, is there a real reason for that,
beyond the usual problem that RMW operations are infrequent in standard
benchmarks?

Presumably the fact that there's a loop involved is uninteresting, since
it should generally only be executed more than once in the contention
case, in which the user presumably really meant the update to be atomic.
And in the other case, the branch in the loop is perfectly predictable.

I'd still like to understand why the RMW is actually appreciably slower.

Hans