[cpp-threads] A question about N2153

Sat Jan 20 08:22:16 GMT 2007

> On Wed, Jan 17, 2007 at 09:00:02PM -0800, Chris Thomasson wrote:
>> ----- Original Message -----
>> From: "Chris Thomasson" <cristom at comcast.net>
>> To: "C++ threads standardisation" <cpp-threads at decadentplace.org.uk>;
>> <paulmck at linux.vnet.ibm.com>
>> Sent: Wednesday, January 17, 2007 4:09 PM
>> Subject: Re: [cpp-threads] A question about N2153

[...]
>> >>>load_depends == *#LoadDepends | #LoadLoad
>> >
>> >>Ummm...  On all CPUs other than Alpha, you don't need -any- fencing,

[...]
>> >>ordering on data dependencies.
>> >I know... Of course load_depends would be a NOP on everything except
>> >Alpha.
>> Okay... Let me just sum up how I would like the new and improved version 
>> of
>> C++, or whatever...
>> To do RCU, well, you do can do the barriers like this:
>> <pseudo c++ code>

[...]
> So "n:(#StoreStore)->next = gs:(#Naked).front" is the same as
> "n->next = gs.front; smp_wmb()"?

Yes. That is:

n->next = gs.front;
membar #StoreStore;

> In the Linux kernel, we use rcu_assign_pointer(), which is a cpp macro
> defined in terms of the architecture-dependent smp_wmb().  So, if I
> understand the above code, in the Linux kernel, one would have the
> following for the last two assignments:
>
> n->next = gs.front;
> rcu_assign_pointer(gs.front, n);

Yes, well, as long as 'rcu_assign_pointer(...)' executed a #StoreStore 
barrier BEFORE it touches gs.front... Does rcu_assign_pointer only execute a 
#StoreStore? It does not add a #LoadStore? Well, then IMHO, perhaps you 
should have two variants:

rcu_assign_pointer_mb_storestore(...)

And:

rcu_assign_pointer_mb_loadstore_storestore(...)

?

> We used to use explicit memory barriers, but found that the above was
> much easier for people to get right.

;^). Well, I have to admit that I also like to wrap up the membar in the 
actual load or store function... Well, take a look at the last 10 or so 
function names/implementations in this code:

http://appcore.home.comcast.net/appcore/src/cpu/i686/ac_i686_gcc_asm.html

x86 is coarse membar granularity, so I can postfix the function names with 
'fence', or 'naked'... Not to functional, however I abstract into a much 
more granular API here, near the bottom of the following include file for 
the above assembler code:

http://appcore.home.comcast.net/appcore/include/cpu/i686/ac_i686_h.html

'fence'
'acquire'
'release'
'depends'
'naked'

So, for my AppCore API, to do a RCU reader well, you do this:

* please note that rcu_read_lock/unlock are not needed in a user-space RCU 
implementation... Pre-emption can be addressed several ways... anyway...

void reader_thread(...) {
  node *n = ac_mb_loadptr_depends(&gs.front);
  while (n) {
    node *nx = ac_mb_loadptr_depends(&n->next);
    n->const_function(...);
    n = nx;
  }
}

> But see below.
[...]

> In the Linux kernel, one would do something like the following:
[...]

> The rcu_dereference() macro is defined in terms of the
> architecture-dependent smp_read_barrier_depends() primitive.
> Again, we used to use explicit memory barriers, but found that the
> above was much easier for people to get right -- and much easier
> to build tools to check for correct usage (see Josh Triplett's
> RCU additions to Linux's "sparse" checker).
>
> So, am I advocating hiding memory barriers completely?  No way!!!

:^)

> People building things like RCU infrastructure and many other things
> need explicit memory barriers in order to get their job done.  However,
> if such people are wise, they will define a clean API that does not
> expose explicit memory barriers to their users.

That's what I did with AppCore:

http://appcore.home.comcast.net/

http://appcore.home.comcast.net/ac_src_index.html

Far from perfect, but at least it does abstract the barriers away 'fairly' 
well...

>> so, the reader-side has exactly 0 memory barriers on every current system
>> out there except the alpha.
>
> Very good!

Very good Indeed!   ;^)

>>                             Also, its weak enough to express just a 
>> normal
>> #StoreStore inside the writers critical section that is guarded by the 
>> stack
>> objects associated mutex... I would kind of like it if C++ would copy 
>> from
>> the SPARC model... Just my humble opinion of course...
>
> I must confess ignorance of your history, but if you like SPARC, you
> like SPARC.

Yeah. I am biased toward the SPARC... Well, its membar instruction is so 
versatile you can realize highly granular memory barrier operations with 
it... That's a plus is my book... Oh well...

> The Linux kernel follows DEC Alpha, but adds smp_rmb(),
> smp_read_barrier_depends(), and so on.

So, code that makes use of such primitives on Linux can be considered a 
fairly portable or what? IMHO, I would fully expect the API's in question to 
be classified under a so-called 'systems-level', aka; subject to possible 
modifications? I must admit that when I am on Linux, I don't make direct use 
of what I consider to be system-level API's... So, raw access to futexs, 
atomic_xxx, and rcu_xxx api's are something I avoid... Instead, I define a 
target architecture, create the supporting assembly language for my AppCore 
Library, and make use of my own API's for, lets say,  lock-free 
programming... It eases my problem with paranoia... You know, I use 
system-level api, or crap a service pack changed something... Now, my apps 
are rendered useless on the 'new' stuff...

:O