[cpp-threads] Web site updated

Tue Feb 13 22:17:38 GMT 2007

On Tue, Feb 13, 2007 at 09:15:08PM -0000, Boehm, Hans wrote:
> I guess such optimizations are potentially important for reference
> counts.
> 
> My model so far has also been that with default compiler options,
> compilers will generate code for atomics (and possibly fences) that
> enforces the indicated ordering constraints if both atomics and ordinary
> variables live in whatever memory type (usually something like writeback
> cacheable) is seen by standard-conforming user-level code.  My limited
> understanding is that this is not always sufficient for kernel code.
> (E.g. Itanium provides a stronger mf.a fence instruction that would
> never be generated in this model.  I believe it is also not generated by
> any of the current __sync gcc primitives.  My recollection of the
> PowerPC rules is that they also differ by memory types,
> though I don't remember the details.)
> 
> As a result of this, I'm not sure to what extent the default compilation
> model will be directly useful for kernel/device driver code.  And that
> probably shouldn't be a major consideration.  I suspect that compilers
> will end up providing flags to generate more kernel-friendly code if
> necessary, and we certainly don't want to discourage that.  But we can't
> require it.

By "device driver code", you are including the increasingly common case
of device drivers that run within user applications?

> Atomics were not designed as a way to fix volatiles for device register
> access and the like.  So far, that problem isn't really being addressed.
> We have really only been talking about variables used for inter-thread
> communication.

I agree that kernel-code examples that have counterparts at user level are
the most important, and that examples that don't involve access to special
hardware registers are more important than those that do.

> But I think that combining accesses can also be a problem at user level,
> for other reasons.  If I write
> 
> Thread1:
> for(;;) {
>     a.store_relaxed(0); // or release
>     a.store_relaxed(1);
> }
> 
> Thread2:
> while (a.load_relaxed());
> <do something useful>

Excellent example!!!

> Should I expect thread 2 to eventually make progress?  Clearly if the
> first a.store_relaxed() is elided, the answer is no.  But an answer of
> "yes" is probably more desirable.

I would certainly prefer "yes" in this case.  ;-)

> We can't promise progress here anyway, since we might have a
> nonpreemptive scheduler, and thread 2 might never get to run.  But I
> think it would be useful to give a hint to implementers and users as to
> what expected behavior in the normal, preemptive, case is.  But I'm not
> sure what the right answer is.

The whole point of this effort is to provide definition to a class of
C/C++ programs that involve parallelism, and that therefore break (or
at least severely bend) the sequential model that C/C++ has been based
on for some decades.  Some constraints on optimization come naturally
with this territory.

I can produce additional examples along this same line, if that would help.

							Thanx, Paul

> Hans
> 
> > -----Original Message-----
> > From: Peter Dimov [mailto:pdimov at mmltd.net] 
> > Sent: Tuesday, February 13, 2007 12:23 PM
> > To: Boehm, Hans; paulmck at linux.vnet.ibm.com; C++ threads 
> > standardisation
> > Subject: Re: [cpp-threads] Web site updated
> > 
> > Boehm, Hans wrote:
> > >> MMIO accesses, anyone?  Hardware timing analysis 
> > (especially if there 
> > >> is a short loop between the two)?
> > >>
> > >> Let's please just outlaw that sort of optimization for 
> > store_raw(), 
> > >> so that it can retain its full meaning.
> > >>
> > > My understanding is that we are constrained here as to what we can 
> > > require in normative standards text.
> > 
> > Even if we aren't so constrained, do we really want to forbid 
> > these optimizations?
> > 
> > atomic_fetchadd_relaxed( &cnt, +1 );
> > 
> > // ordinary ops 1
> > 
> > atomic_fetchadd_relaxed( &cnt, +1 );
> > 
> > // ordinary ops 2
> > 
> > atomic_fetchadd_relaxed( &cnt, +1 );
> > 
> > I'd really like the MM to be enlightened/permissive enough as to allow
> > 
> > // ordinary ops 1+2
> > 
> > atomic_fetchadd_relaxed( &cnt, +3 );
> > 
> > here. (This should work even for _release because moving in 
> > one direction is still allowed; not for the various flavors 
> > of _ordered, though.)
> > 
> > Similarly,
> > 
> > atomic_fetchadd_relaxed( &cnt, r );
> > atomic_fetchadd_relaxed( &cnt, -r );
> > 
> > should be a noop, even for _ordered.
> > 
> > > I am less clear as to what we can reasonably suggest as nonbinding 
> > > notes in the standard.
> > 
> > We are free to suggest pretty much anything in notes, 
> > examples and footnotes, as long as it doesn't contradict the 
> > normative text.
> > 
> >