[cpp-threads] modes, pass 2

Doug Lea dl at cs.oswego.edu
Sun May 8 15:15:14 BST 2005


How about the following functions for atomics.
Excuse the crummy names.

   get_ordered()
   get_acquire()
   set_ordered(val)
   set_release(val)
   compare_and_set_ordered(cmp, val)
   compare_and_set_acquire(cmp, val)
   compare_and_set_release(cmp, val)
   compare_and_set_acquire_release(cmp, val)
   fallible_compare_and_set_ordered(cmp, val) // LL/SC-friendly versions
   fallible_compare_and_set_acquire(cmp, val)
   fallible_compare_and_set_release(cmp, val)
   fallible_compare_and_set_acquire_release(cmp, val)
Plus, for ints and longs
   get_and_add_ordered(val)
   get_and_increment_ordered()
   get_and_decrement_ordered()

The "ordered" suffix needs a better name, but the intent
is that it is the lightest barrier (often no-op) that
prevents re-orderings of successive loads (for get_ordered)
or stores (for set_ordered) or operations on the same
variable (for CAS_ordered).

Using "ordered" avoids needing to spec out "raw" memory ops on atomics
without forcing enough overhead to matter. Often no overhead
at all, because the platform doesn't need barriers or common cases are
easily optimized by compilers. But here and there, people looking for an
extra couple of percent performance on some platforms might want to
write a few lines of assembler. (This is not too different from,
and should be even rarer than, people writing a few lines
assembler to do fast/odd things with x86 floating point that can't be
expressed in C/C++ proper.)

For get_and_X, it seems simplest to only support "ordered".
This is normally what you want for ref-counting etc.
(On most machines you will get barriers anyway though.)
If you need other modes, just use CAS.

All the permutations of CAS cry out for msync::-style solution, which
otherwise seems like a bad idea since most modes won't make sense for
loads and stores.

Alexander and Peter, especially. Can you live with something like this?

Mappings to common processors look pretty reasonable.
For example, on x86 and sparc, only store_release needs
an explict barrier (or locked xchg). All others map
to the obviously corresponding instructions. I think IA64 is
also straightforward. As always, I won't even guess about PPC,
but would appreciate it if someone sanity-checked those.

-Doug






More information about the cpp-threads mailing list