[cpp-threads] Web site updated
Paul E. McKenney
paulmck at linux.vnet.ibm.com
Tue Feb 13 22:17:38 GMT 2007
On Tue, Feb 13, 2007 at 09:15:08PM -0000, Boehm, Hans wrote:
> I guess such optimizations are potentially important for reference
> counts.
>
> My model so far has also been that with default compiler options,
> compilers will generate code for atomics (and possibly fences) that
> enforces the indicated ordering constraints if both atomics and ordinary
> variables live in whatever memory type (usually something like writeback
> cacheable) is seen by standard-conforming user-level code. My limited
> understanding is that this is not always sufficient for kernel code.
> (E.g. Itanium provides a stronger mf.a fence instruction that would
> never be generated in this model. I believe it is also not generated by
> any of the current __sync gcc primitives. My recollection of the
> PowerPC rules is that they also differ by memory types,
> though I don't remember the details.)
>
> As a result of this, I'm not sure to what extent the default compilation
> model will be directly useful for kernel/device driver code. And that
> probably shouldn't be a major consideration. I suspect that compilers
> will end up providing flags to generate more kernel-friendly code if
> necessary, and we certainly don't want to discourage that. But we can't
> require it.
By "device driver code", you are including the increasingly common case
of device drivers that run within user applications?
> Atomics were not designed as a way to fix volatiles for device register
> access and the like. So far, that problem isn't really being addressed.
> We have really only been talking about variables used for inter-thread
> communication.
I agree that kernel-code examples that have counterparts at user level are
the most important, and that examples that don't involve access to special
hardware registers are more important than those that do.
> But I think that combining accesses can also be a problem at user level,
> for other reasons. If I write
>
> Thread1:
> for(;;) {
> a.store_relaxed(0); // or release
> a.store_relaxed(1);
> }
>
> Thread2:
> while (a.load_relaxed());
> <do something useful>
Excellent example!!!
> Should I expect thread 2 to eventually make progress? Clearly if the
> first a.store_relaxed() is elided, the answer is no. But an answer of
> "yes" is probably more desirable.
I would certainly prefer "yes" in this case. ;-)
> We can't promise progress here anyway, since we might have a
> nonpreemptive scheduler, and thread 2 might never get to run. But I
> think it would be useful to give a hint to implementers and users as to
> what expected behavior in the normal, preemptive, case is. But I'm not
> sure what the right answer is.
The whole point of this effort is to provide definition to a class of
C/C++ programs that involve parallelism, and that therefore break (or
at least severely bend) the sequential model that C/C++ has been based
on for some decades. Some constraints on optimization come naturally
with this territory.
I can produce additional examples along this same line, if that would help.
Thanx, Paul
> Hans
>
> > -----Original Message-----
> > From: Peter Dimov [mailto:pdimov at mmltd.net]
> > Sent: Tuesday, February 13, 2007 12:23 PM
> > To: Boehm, Hans; paulmck at linux.vnet.ibm.com; C++ threads
> > standardisation
> > Subject: Re: [cpp-threads] Web site updated
> >
> > Boehm, Hans wrote:
> > >> MMIO accesses, anyone? Hardware timing analysis
> > (especially if there
> > >> is a short loop between the two)?
> > >>
> > >> Let's please just outlaw that sort of optimization for
> > store_raw(),
> > >> so that it can retain its full meaning.
> > >>
> > > My understanding is that we are constrained here as to what we can
> > > require in normative standards text.
> >
> > Even if we aren't so constrained, do we really want to forbid
> > these optimizations?
> >
> > atomic_fetchadd_relaxed( &cnt, +1 );
> >
> > // ordinary ops 1
> >
> > atomic_fetchadd_relaxed( &cnt, +1 );
> >
> > // ordinary ops 2
> >
> > atomic_fetchadd_relaxed( &cnt, +1 );
> >
> > I'd really like the MM to be enlightened/permissive enough as to allow
> >
> > // ordinary ops 1+2
> >
> > atomic_fetchadd_relaxed( &cnt, +3 );
> >
> > here. (This should work even for _release because moving in
> > one direction is still allowed; not for the various flavors
> > of _ordered, though.)
> >
> > Similarly,
> >
> > atomic_fetchadd_relaxed( &cnt, r );
> > atomic_fetchadd_relaxed( &cnt, -r );
> >
> > should be a noop, even for _ordered.
> >
> > > I am less clear as to what we can reasonably suggest as nonbinding
> > > notes in the standard.
> >
> > We are free to suggest pretty much anything in notes,
> > examples and footnotes, as long as it doesn't contradict the
> > normative text.
> >
> >
More information about the cpp-threads
mailing list