[cpp-threads] Slightly revised memory model proposal (D2300)

Sun Jun 10 06:12:55 BST 2007

On Wed, 6 Jun 2007, Raul Silvera wrote:

>
> Hans, I mentioned this privately, but I wanted to make sure it is voiced =
on
> the list as well.
>
> I think it is a mistake to disallow Peter Dimov's example (for reference)=
:
>
> Thread 1                      Thread 2                   Thread 3
> x=3D1                           y=3D1                    if (load_acquire=
(&v)=3D=3D2)
> fetch_add_release(&v,1)       fetch_add_release(&v,1)=09    assert (x+y=
=3D=3D2)
>
> I think this is too counterintuitive and borders on unusable. Also, it
> makes fetch_add_release(&v, 0) different from a noop.
This certainly bothers me as well.
>
> Furthermore, I believe it is unnecessary, since PPC (which I assume is th=
e
> architecture that is triggering this change) has mechanisms to efficientl=
y
> implement this particular situation. Basically, dependence ordering on th=
e
> fetch_add_release causes a cheap acquire between the load and store of v.
> On other architectures, I suspect the acquire is implied or at least it i=
s
> cheap enough that it can always be introduced.
Again, I think the problem is not that it is difficult to ensure that
this example produces the desired outcome.  The problem is that if we
include a rule that guarantees it, and combine it with other properties
that seem similarly essential in small examples, we inadvertently
constrain more complicated examples such that they are no longer
efficiently implementable.  The example in question really is Sarita's.
>
> My proposal is to single out RMW operations and indicate they must at lea=
st
> have a local order between their load and store, which is I believe all
> that is needed for this example to fit the model. Note that this doesn't
> require a change to the memory model, but to the definition of the RMW
> operations.
I have been assuming that atomic RMWs are viewed as single actions in the
ordering which, if I understand correctly, is at least as strong as
what you are proposing.  At least in the current proposal, this doesn't
help, since there is no happens-before ordering between the first
and second fetch_add_release (or between the store of the first one
and the load of the second, if you want to view it that way.)

I think we do have the option of dropping the cache coherence requirement
instead.  That would allow things like

Thread 1:
store_relaxed(&x, 1);

Thread 2:
store_relaxed(&x, 2);

Thread 3:
r1 =3D load_acquire(&x); (1)
r2 =3D load_acquire(&x); (2)
r3 =3D load_acquire(&x); (1)

but would support Peter's example.

Effectively, racing stores can appear to repeatedly become visible and
invisible again, but each load still has to return a consistent value.
This also feels very unintuitive, but it may have less practical impact.

Modification order would be used only in defining "later stored values"
in the synchronizes with definition, and wouldn't constrain visibility.

I'm starting to think that this might be a better approach?

(Again I'm uncomfortable about all of this without a better understanding
of the PowerPC model than I currently have.  I'm currently just looking
at ways to get "the right" answer for Sarita's example.)

Hans
>
> --
> Ra=FAl E. Silvera         IBM Toronto Lab   Team Lead, Toronto Portable
> Optimizer (TPO)
> Tel: 905-413-4188 T/L: 969-4188           Fax: 905-413-4854
> D2/KC9/8200/MKM
>
>
>
>
>              "Boehm, Hans"
>              <hans.boehm at hp.co
>              m>                                                         T=
o
>              Sent by:                  <paulmck at linux.vnet.ibm.com>,
>              cpp-threads-bounc         <cpp-threads at decadentplace.org.uk>
>              es at decadentplace.                                          c=
c
>              org.uk
>                                                                    Subjec=
t
>                                        [cpp-threads] Slightly revised
>              06/04/07 07:43 PM         memory model proposal (D2300)
>
>
>              Please respond to
>                 C++ threads
>               standardisation
>              <cpp-threads at deca
>              dentplace.org.uk>
>
>
>
>
>
>
> Attached is an early draft of a document with just the threads changes
> from N2171, adjusted a little as a result of feedback from Paul and
> others.
>
> Note that the "D" in D2300 indicates that this is not stable, and
> multiple versions of this document will share the same number.
> Hopefully I will remember to update the date.  This will eventually turn
> into N2300.
>
> Hans
> (See attached file: D2300.html)--
> cpp-threads mailing list
> cpp-threads at decadentplace.org.uk
> http://www.decadentplace.org.uk/cgi-bin/mailman/listinfo/cpp-threads
>