[cpp-threads] [Javamemorymodel-discussion] there's a happens-before orderhere. right?

Mon Dec 15 23:41:47 GMT 2008

On Mon, Dec 15, 2008 at 11:04:20PM +0000, Boehm, Hans wrote:
> 
> 
> > -----Original Message-----
> > From: Alexander Terekhov [mailto:alexander.terekhov at gmail.com]
> > Sent: Monday, December 15, 2008 2:04 PM
> > To: Boehm, Hans
> > Cc: Endre Stølsvik; javamemorymodel-discussion at cs.umd.edu;
> > C++ threads standardisation
> > Subject: Re: [Javamemorymodel-discussion] there's a
> > happens-before orderhere. right?
> >
> > On Mon, Dec 15, 2008 at 9:38 PM, Boehm, Hans
> > <hans.boehm at hp.com> wrote:
> > >
> > > [I think all of this is relevant to Java only in that
> > vaguely similar
> > > extensions have been considered there at times.  Most of it doesn't
> > > currently translate, in spite of the fact that the basic
> > memory models
> > > are similar.]
> > >
> > >> From: Alexander Terekhov [mailto:alexander.terekhov at gmail.com]
> > >> On Sat, Dec 13, 2008 at 2:04 AM, Boehm, Hans <hans.boehm at hp.com>
> > >> wrote:
> > >> [...]
> > >> >>
> > >>
> > http://www.hpl.hp.com/personal/Hans_Boehm/c++mm/threadsintro.html#exa
> > >> mples
> > >> >>
> > >> >> would you label the following C++0x program
> > >> >>
> > >> >>     int data; //= 0
> > >> >>     atomic<int> x; //= 0
> > >> >>
> > >> >>     thread 1:
> > >> >>     ------------
> > >> >>     data = 1;
> > >> >>     x.store(1, release);
> > >> >>
> > >> >>     thread 2:
> > >> >>     ------------
> > >> >>     if (x.load(relaxed))
> > >> >>       data = 2;
> > >> >>
> > >> >> data-race-free or not? Why? TIA.
> > >> >>
> > >> > This is well in the "escapes from sequential
> > consistency" category,
> > >> > and it doesn't currently have a Java analog.
> > >>
> > >> Yes. I'm actually driving at
> > >>
> > >> http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2008/n2745.html
> > >>
> > >> > As I said, I was really trying to steer most people well
> > away from
> > >> > this kind of code.
> > >>
> > >> The problem is that std::atomic<> in SC mode is utterly
> > expensive on
> > >> POWER. Even in acquire-release mode it will inject way too much
> > >> totally redundant (lw/i)sync(hronization).
> > >
> > > I agree that this can be an issue on some current
> > architectures, as it
> > > is with Java volatiles.  But the above example can be fixed
> > by either:
> > >
> > > 1) Using an acquire load, or
> >
> > using an atomic_thread_fence(memory_order_acquire)
> You're right.  That's another option.  Unfortunately, it's currently architecture dependent which one is faster.  The fence always overconstrains ordering, since it enforces ordering between later and subsequent loads, for example, while the acquire load unnecessarily constrains ordering on the other path.
> 
> >
> > [...]
> >
> > >> > if (r1 = x.load(memory_order_relaxed)) {
> > >> >    data.store(2, memory_order_relaxed); } else {
> > >> >    data.store(2, memory_order_relaxed); }
> > >> >
> > >> > to be ordered, for example.
> > >>
> > >> Why not?
> > >
> > > Any compiler that cares about code size is likely to
> > transform this to
> > >
> > > r1 = x.load(memory_order_relaxed);
> > > data.store(2, memory_order_relaxed);
> > >
> > > and go from there, rordering at will, and allowing the
> > hardware to reorder.
> > >
> > > Preventing this is hard, since the actual code may look like
> > >
> > > inside f():
> > >
> > >  r1 = x.load(memory_order_relaxed);
> > >  g(r1);
> > >
> > > inside g(), in a separate compilation unit:
> >
> > You're using the term not defined in the current C++ standard
> > ("compilation unit").
> >
> > FWIW, it is fine to restrict reordering of
> >
> > if (r1 = x.load(memory_order_relaxed)) {
> >    data.store(2, memory_order_relaxed); } else {
> >    data.store(2, memory_order_relaxed); }
> >
> > only in the same "scope". ;-)
> I think that memory_order_consume basically handles this correctly.  The constraint propogates into called functions, unless you explicitly prevent that with kill_dependency.  If the compiler can't ensure that the called function will propogate dependencies (e.g. because it was compiled separately, or it's just too expensive to do so) it has to compile memory_order_consume as memory_order_acquire.  You get consistent semantics, but the performance will depend on how much the compiler can "see".
> 
> But at the moment, "memory_order_consume" does not respect any control dependencies.  It's built on a very specific notion of data dependence (1.10p8).

Indeed -- although PowerPC respects control dependencies, other
architectures apparently do not.  So while I would personally like
to see control dependencies added, it just was not in the cards at
the time.   :-(

							Thanx, Paul

> Hans
> >
> > regards,
> > alexander.
> >
> 
> -- 
> cpp-threads mailing list
> cpp-threads at decadentplace.org.uk
> http://www.decadentplace.org.uk/cgi-bin/mailman/listinfo/cpp-threads