[cpp-threads] Web site updated

Tue Feb 13 21:15:08 GMT 2007

I guess such optimizations are potentially important for reference
counts.

My model so far has also been that with default compiler options,
compilers will generate code for atomics (and possibly fences) that
enforces the indicated ordering constraints if both atomics and ordinary
variables live in whatever memory type (usually something like writeback
cacheable) is seen by standard-conforming user-level code.  My limited
understanding is that this is not always sufficient for kernel code.
(E.g. Itanium provides a stronger mf.a fence instruction that would
never be generated in this model.  I believe it is also not generated by
any of the current __sync gcc primitives.  My recollection of the
PowerPC rules is that they also differ by memory types,
though I don't remember the details.)

As a result of this, I'm not sure to what extent the default compilation
model will be directly useful for kernel/device driver code.  And that
probably shouldn't be a major consideration.  I suspect that compilers
will end up providing flags to generate more kernel-friendly code if
necessary, and we certainly don't want to discourage that.  But we can't
require it.

Atomics were not designed as a way to fix volatiles for device register
access and the like.  So far, that problem isn't really being addressed.
We have really only been talking about variables used for inter-thread
communication.

But I think that combining accesses can also be a problem at user level,
for other reasons.  If I write

Thread1:
for(;;) {
    a.store_relaxed(0); // or release
    a.store_relaxed(1);
}

Thread2:
while (a.load_relaxed());
<do something useful>

Should I expect thread 2 to eventually make progress?  Clearly if the
first a.store_relaxed() is elided, the answer is no.  But an answer of
"yes" is probably more desirable.

We can't promise progress here anyway, since we might have a
nonpreemptive scheduler, and thread 2 might never get to run.  But I
think it would be useful to give a hint to implementers and users as to
what expected behavior in the normal, preemptive, case is.  But I'm not
sure what the right answer is.

Hans

> -----Original Message-----
> From: Peter Dimov [mailto:pdimov at mmltd.net] 
> Sent: Tuesday, February 13, 2007 12:23 PM
> To: Boehm, Hans; paulmck at linux.vnet.ibm.com; C++ threads 
> standardisation
> Subject: Re: [cpp-threads] Web site updated
> 
> Boehm, Hans wrote:
> >> MMIO accesses, anyone?  Hardware timing analysis 
> (especially if there 
> >> is a short loop between the two)?
> >>
> >> Let's please just outlaw that sort of optimization for 
> store_raw(), 
> >> so that it can retain its full meaning.
> >>
> > My understanding is that we are constrained here as to what we can 
> > require in normative standards text.
> 
> Even if we aren't so constrained, do we really want to forbid 
> these optimizations?
> 
> atomic_fetchadd_relaxed( &cnt, +1 );
> 
> // ordinary ops 1
> 
> atomic_fetchadd_relaxed( &cnt, +1 );
> 
> // ordinary ops 2
> 
> atomic_fetchadd_relaxed( &cnt, +1 );
> 
> I'd really like the MM to be enlightened/permissive enough as to allow
> 
> // ordinary ops 1+2
> 
> atomic_fetchadd_relaxed( &cnt, +3 );
> 
> here. (This should work even for _release because moving in 
> one direction is still allowed; not for the various flavors 
> of _ordered, though.)
> 
> Similarly,
> 
> atomic_fetchadd_relaxed( &cnt, r );
> atomic_fetchadd_relaxed( &cnt, -r );
> 
> should be a noop, even for _ordered.
> 
> > I am less clear as to what we can reasonably suggest as nonbinding 
> > notes in the standard.
> 
> We are free to suggest pretty much anything in notes, 
> examples and footnotes, as long as it doesn't contradict the 
> normative text.
> 
>