[cpp-threads] A question about N2153

Thu Jan 18 15:46:36 GMT 2007

On Wed, Jan 17, 2007 at 09:00:02PM -0800, Chris Thomasson wrote:
> 
> ----- Original Message -----
> From: "Chris Thomasson" <cristom at comcast.net>
> To: "C++ threads standardisation" <cpp-threads at decadentplace.org.uk>;
> <paulmck at linux.vnet.ibm.com>
> Sent: Wednesday, January 17, 2007 4:09 PM
> Subject: Re: [cpp-threads] A question about N2153
> 
> 
> >[...]
> >
> >>>load_depends == *#LoadDepends | #LoadLoad
> >
> >>Ummm...  On all CPUs other than Alpha, you don't need -any- fencing,
> >>barriers, or special instructions to cause the CPU to respect
> >>ordering on data dependencies.
> >
> >I know... Of course load_depends would be a NOP on everything except
> >Alpha.
> 
> Okay... Let me just sum up how I would like the new and improved version of
> C++, or whatever...
> 
> To do RCU, well, you do can do the barriers like this:
> 
> <pseudo c++ code>
> 
> // shared stack
> class node;
> static mystack_t gs;
> 
> // writer side - mutex-based
> void writer_thread(...) {
>  node *n = node_cache::pop(...);
>  lock_guard lock(gs);
>    n:(#StoreStore)->next = gs:(#Naked).front;
>    gs:(#Naked).front = n;
> }

So "n:(#StoreStore)->next = gs:(#Naked).front" is the same as
"n->next = gs.front; smp_wmb()"?

In the Linux kernel, we use rcu_assign_pointer(), which is a cpp macro
defined in terms of the architecture-dependent smp_wmb().  So, if I
understand the above code, in the Linux kernel, one would have the
following for the last two assignments:

	n->next = gs.front;
	rcu_assign_pointer(gs.front, n);

We used to use explicit memory barriers, but found that the above was
much easier for people to get right.  But see below.

> // reader side
> void reader_thread(...) {
>  // nop on most systems
>  node *n:(#LoadDepends | #LoadLoad) = gs:(#Naked).front;
>  while(n) {
>    // again, its a nop
>    node *nx:(#LoadDepends | #LoadLoad) = n:(#Naked)->next;
>    n->const_function(...);
>    n = nx;
>  }
> }

In the Linux kernel, one would do something like the following:

	void reader_thread(...)
	{
		node *n;

		rcu_read_lock();
		n = rcu_dereference(gs);
		while (n) {
			node *nx = rcu_dereference(n->next);
			n->const_function(...);
			n = nx;
		}
		rcu_read_unlock();
	}

The rcu_dereference() macro is defined in terms of the
architecture-dependent smp_read_barrier_depends() primitive.
Again, we used to use explicit memory barriers, but found that the
above was much easier for people to get right -- and much easier
to build tools to check for correct usage (see Josh Triplett's
RCU additions to Linux's "sparse" checker).

So, am I advocating hiding memory barriers completely?  No way!!!

People building things like RCU infrastructure and many other things
need explicit memory barriers in order to get their job done.  However,
if such people are wise, they will define a clean API that does not
expose explicit memory barriers to their users.

> so, the reader-side has exactly 0 memory barriers on every current system
> out there except the alpha.

Very good!

>                             Also, its weak enough to express just a normal
> #StoreStore inside the writers critical section that is guarded by the stack
> objects associated mutex... I would kind of like it if C++ would copy from
> the SPARC model... Just my humble opinion of course...

I must confess ignorance of your history, but if you like SPARC, you
like SPARC.  The Linux kernel follows DEC Alpha, but adds smp_rmb(),
smp_read_barrier_depends(), and so on.

> ;^)

							Thanx, Paul