<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd">

<html><head>

<meta http-equiv="Content-Type" content="text/html;charset=us-ascii">

<title>N3125: Omnibus Memory Model and Atomics Paper</title>

<style type="text/css">

        ins {background-color:#A0FFA0}

        del {background-color:#FFA0A0}

</style>

</head><body>

<h1>N3125: Omnibus Memory Model and Atomics Paper</h1>

<p>

ISO/IEC JTC1 SC22 WG21 N3125 = 10-0115 - 2010-08-22

</p>

<p>

Paul E. McKenney, paulmck@linux.vnet.ibm.com

<br>

Mark Batty, mjb220@cl.cam.ac.uk

<br>

Clark Nelson, clark.nelson@intel.com

<br>

Hans Boehm, hans.boehm@hp.com

<br>

Anthony Williams, anthony@justsoftwaresolutions.co.uk

<br>

Scott Owens, Scott.Owens@cl.cam.ac.uk

<br>

Susmit Sarkar, susmit.sarkar@cl.cam.ac.uk

<br>

Peter Sewell, Peter.Sewell@cl.cam.ac.uk

<br>

Tjark Weber, tw333@cam.ac.uk

<br>

Michael Wong, michaelw@ca.ibm.com

<br>

Lawrence Crowl, crowl@google.com

</p>

<h2>Introduction</h2>

<p>

Mark Batty, Scott Owens, Susmit Sarkar, Peter Sewell, and Tjark Weber

recently analyzed a formalized variant of the C++

memory model, which uncovered a number of potential issues discussed in

an email thread entitled

&ldquo;Some more memory model issues from Mark Batty&rdquo;

initiated by Hans Boehm on June 13, 2010, and in another email thread

entitled &ldquo;Further C++ concurrency discussion&rdquo;

initiated by Mark Batty on July 28, 2010.

This paper summarizes the ensuing discussion, calling out changes that

appear uncontroversial and summarizing positions on contended issues.

Some of these issues have been captured as national-body comments,

and the disposition of any remaining issues is to be determined.

</p>

<p>Please note that only those issues related to the memory model

are included in this paper.

In addition, this version of the paper does not yet include

issues raised in the July thread that were not also covered in

the June thread.

</p>

<p>More details on the work leading up to this paper may be found

<a href="http://www.cl.cam.ac.uk/~mjb220/cpp/model.pdf">here</a>

and

<a href="http://www.cl.cam.ac.uk/~pes20/cpp/">here</a>.

</p>

<h2>Editorial Issues Discussed in June Email Thread</h2>

<h3>GB 5: Inter-thread-happens-before is not acyclic [Clark]</h3>

<dl>

<dt>Sections:</dt>

<dd>1.9</dd>

<dt>Comment:</dt>

<dd>The evaluation of function arguments are now indeterminately sequenced,

rather than left completely unspecified, as part of the new language

describing the memory model. A clearer example of unspecified behavior

should be used here.

</dd>

<dt>Proposal:</dt>

<dd>Make the editorial change.</dd>

<dt>Resolution:</dt>

<dd>Done</dd>

</dl>

<h2>Non-Controversial Issues Discussed in June Email Thread</h2>

<h3>CA 8, GB 10: Inter-thread-happens-before is not acyclic [Clark]</h3>

<dl>

<dt>Sections:</dt>

<dd>1.10p10, 1.10p11</dd>

<dt>Comment:</dt>

<dd>The following litmus test is generally agreed to be disallowed,

however, Batty et al. uncovered a case where the standard does not

forbid it, as shown by the following example from their paper:

<blockquote>

<table border=3>

<tr><th>Thread 0</th><th>Thread 1</th></tr>

<tr><td><code>r1 = x.load(memory_order_consume);</code></td>

        <td><code>r2 = y.load(memory_order_consume);</code></td></tr>

<tr><td><code>y.store(1, memory_order_release);</code></td>

        <td><code>x.store(1, memory_order_release);</code></td></tr>

</table>

</blockquote>

<p>The standard permits the counter-intuitive outcome

<code>x == 1 && y == 1</code>, an outcome that cannot occur on any

hardware platform that we are aware of.

</p>

</dd>

<dt>Proposal:</dt>

<dd>In 1.10p10:

<blockquote>

<p>An evaluation <var>A</var> happens before an

evaluation <var>B</var> if:</p>

<ul>

<li><var>A</var> is sequenced before <var>B</var>,

or</li>

<li><var>A</var> inter-thread happens before <var>B</var>.</li>

</ul>

<p><ins>The implementation shall ensure that no program

execution demonstrates a cycle in the "happens before" relation. [ <em>Note:</em>

This would otherwise be possible only through the use of consume

operations. &#8212; <em>end note</em> ]</ins></p>

</blockquote>

</dd>

<dt>Resolution:</dt>

<dd>Adopt the proposal.</dd>

</dl>

<h3>CA 9: Imposed happens-before edges should be synchronizes-with [Clark]</h3>

<dl>

<dt>Sections:</dt>

<dd>1.10p7, 27.2.3p2, 29.3p1, 30.3.1.2p6, 30.3.1.5p7, 30.6.4p7, 30.6.9p5,

and 30.6.10.1p23</dd>

<dt>Comment:</dt>

<dd>The happens-before relation is not transitive, and so it is not

appropriate to specify happens-before for library functions that are

intended to impose ordering because happens-before cannot always be

extended using a trailing sequenced-before relation.

Therefore, synchronized-with should be used in place of happens-before

for this purpose.

</dd>

<dt>Proposal:</dt>

<dd>

<p>Change 1.10p7:</p>

<blockquote>

<p>Certain library calls <dfn>synchronize with</dfn>

other library calls

performed by another thread. <del>In particular, an

atomic operation <var>A</var> that performs a release

operation on an atomic

object <var>M</var> synchronizes with an atomic

operation <var>B</var> that performs an acquire operation

on <var>M</var> and reads a value

written by any side effect in

the release sequence headed by <var>A</var>.</del> <ins>[

<em>Example:</em> An atomic store-release synchronizes

with

a load-acquire that takes its value from the store. (29.3 atomics.order) &#8212; <em>end example</em> ]</ins>

[ <em>Note: ...</em></p>

</blockquote>

<p>Insert a new paragraph following 29.3p1:</p>

<blockquote>

<p><ins>An atomic operation <var>A</var>

that performs

a release

operation on an atomic

object <var>M</var> synchronizes with an atomic

operation <var>B</var> that performs an acquire operation

on <var>M</var> and takes its value from any side effect

in

the release sequence headed by <var>A</var>.</ins></p>

</blockquote>

<p>Replace &ldquo;happens-before&rdquo; with

&ldquo;synchronizes-with&rdquo; in 27.2.3p2:</p>

<blockquote><p>

        If one thread makes a library call <var>a</var> that writes a

        value to a stream and, as a result, another thread reads this

        value from the stream through a library call <var>b</var> such

        that this does not result in a data race, then <var>a</var><ins>'s

        write</ins>

        <del>happens before</del> <ins>synchronizes with</ins>

<var>b</var><ins>'s read</ins>.

</p>

</blockquote>

<p>Replace &ldquo;happens-before&rdquo; with &ldquo;synchronizes-with&rdquo; in

30.3.1.2p6:</p>

<blockquote><p>

          Synchronization: The <ins>last store operation in the</ins>

          invocation of the constructor <del>happens

          before</del> <ins>synchronizes with</ins> the <ins>first read

operation in the</ins> invocation of the copy of <var>f</var>.

</p> 

</blockquote>

<p>Replace &ldquo;happens-before&rdquo; with &ldquo;synchronizes-with&rdquo; in

30.3.1.5p7:</p>

<blockquote><p>

        Synchronization: The <del>completion of</del> <ins>last store

        operation carried out by</ins> the thread represented by

        <var>*this</var> <del>happens before</del> <ins>synchronizes

        with</ins> (1.10) <ins>the corresponding successful</ins>

        <var>join()</var> return<del>s</del>. [ Note: Operations

        on <var>*this</var> are not synchronized. &mdash; end note ]

</p> 

</blockquote>

<p>Replace &ldquo;happens-before&rdquo; with &ldquo;synchronizes-with&rdquo; in

30.6.4p7:</p>

<blockquote><p>

        Calls to functions that successfully set the stored result of

        an associated asynchronous state synchronize with (1.10) calls

        to functions successfully detecting the ready state resulting

        from that setting. The storage of the result (whether normal or

        exceptional) into the associated asynchronous state <del>happens

        before</del> <ins>synchronizes with</ins>

        (1.10) <del>that state is set to ready</del>

        <ins>the successful return from a call to a waiting function on

the associated asynchronous state</ins>.

</p> 

</blockquote>

<p>Replace &ldquo;happens-before&rdquo; with &ldquo;synchronizes-with&rdquo; in

30.6.9p5:</p>

<blockquote><p>

        Synchronization: the invocation of async <del>happens

        before</del> <ins>synchronizes with</ins> (1.10) the invocation

        of <var>f</var>. [ Note: this statement applies even when

        the corresponding <var>future</var> object is moved to another

        thread. &mdash; end note ] If the invocation is not deferred, a call

        to a waiting function on an asynchronous return object that shares

        the associated asynchronous state created by this <var>async</var>

        call shall block until the associated thread has completed. If the

        invocation is not deferred, the <var>join()</var> on the created

        thread <del>happens-before</del> <ins>synchronizes with</ins>

        (1.10) the first function that successfully detects the ready

        status of the associated asynchronous state returns or before

        the function that gives up the last reference to the associated

        asynchronous state returns, whichever happens first. If the

        invocation is deferred, the completion of the invocation of the

        deferred function <del>happens-before</del> <ins>synchronizes

        with</ins> the <ins>the successful return from a call to a

        waiting function on the associated asynchronous state.</ins>

        <del>calls to the waiting functions return.</del>

</p> 

</blockquote>

<p>Replace &ldquo;happens-before&rdquo; with &ldquo;synchronizes-with&rdquo; in

30.6.10.1p23:</p>

<blockquote><p>

        Synchronization: a successful call to <var>operator()</var>

        synchronizes with (1.10) a call to any member function

        of a <var>future</var>, <var>shared_future</var>, or

        <var>atomic_future</var> object that shares the associated

        asynchronous state of <var>*this</var>. The completion of the

        invocation of the stored task and the storage of the result

        (whether normal or exceptional) into the associated asynchronous

        state <del>happens before</del> <ins>synchronizes with</ins>

        (1.10) <ins>the successful return from any member function that

detects that</ins> the state is set to ready.

        [ Note: <var>operator()</var>

        synchronizes and serializes with other functions through the

        associated asynchronous state. &mdash; end note ]

</p> 

</blockquote>

</dd>

<dt>Resolution:</dt>

<dd>Adopt the proposal.</dd>

</dl>

<h3>CA 11: &ldquo;Subsequent&rdquo; in vsse definition</h3>

<dl>

<dt>Comment</dt>

<dd>

<p>Batty et al. propose removing the word &ldquo;subsequent&rdquo; from

1.10p12 (presumably instead meaning 1.10p13), stating that this

will clarify the definition.

</p>

</dd>

</dl>

<h4>Discussion</h4>

This change has interesting consequences.

The current wording is as follows, with the word being proposed for

removal so marked:

</p>

<blockquote>

        <p>The <i>visible sequence of side effects</i> on an atomic object

        <code>M</code>, with respect to a value computation <code>B</code>

        of <code>M</code>, is a maximal contiguous sub-sequence of side

        effects in the modification order of <code>M</code>, where the

        first side effect is visible with respect to <code>B</code>,

        and for every <del>subsequent</del> side effect, it is not the case

        that B happens before it.

        The value of an atomic object <code>M</code>, as determined by

        evaluation <code>B</code>, shall be the value stored by some

        operation in the visible sequence of <code>M</code> with respect

to <code>B</code>.

        Furthermore, if a value computation <code>A</code> of an

        atomic object <code>M</code>happens before a value computation

        <code>B</code> of <code>M</code>, and the value computed by

        <code>A</code> corresponds to the value stored by side effect

        <code>X</code>, then the value computed by <code>B</code>

        shall either equal the value computed by <code>A</code>,

        or be the value stored by side effect <code>Y</code>, where

        <code>Y</code> follows <code>X</code> in the modification order

of <code>M</code>.

        [ Note: This effectively disallows compiler reordering of

        atomic operations to a single object, even if both operations

        are &ldquo;relaxed&rdquo; loads.

        This effectively makes the &ldquo;cache coherence&rdquo; guarantee

        provided by most hardware available to C++ atomic operations. &mdash;

        end note ]

        [ Note: The visible sequence depends on the

        &ldquo;happens before&rdquo; relation,

        which depends on the values observed by loads of atomics, which we

        are restricting here. The intended reading is that there must exist

        an association of atomic loads with modifications they observe that,

        together with suitably chosen modification orders and the

        &ldquo;happens before&rdquo; relation derived as described above,

        satisfy the resulting constraints as imposed here. &mdash; end note ]

        </p>

</blockquote>

<p>The effect of the current wording is as follows:

</p>

<ol>

<li>        The last side-effect in the modification order of <code>M</code>

        that happens before value computation <code>B</code> is the

        visible side effect.

Call it <code>V</code>.

<li>        The first side-effect in the modification order of <code>M</code>

        such that <code>B</code> happen before it will be called

<code>I</code>.

        <code>I</code> and all subsequent side-effects are <i>not</i>

        in the visible sequence of side effects.

<li>        The word &ldquo;subsequent&rdquo; adds the constraint that

        no side effect in the modification order of <code>M</code>

        that precedes <code>V</code> can be part of the visible

        sequence of side effects.

</ol>

<p>Does some hardware actually operate in this fashion, so that a value

computation might return some value preceding the last side-effect in

the modification order of <code>M</code> that happens-before that value

computation?

</p>

<h4>Resolution:</h4>

Adopt the changes proposed for CA 8.

</p>

<h3>CA 12: The use of maximal in the definition of release sequence [Paul]</h3>

<dl>

<dt>Sections:</dt>

<dd>1.10p6</dd>

<dt>Comment:</dt>

<dd>

<p>Batty et al. describe an interpretation of 1.10p6 that would

only require that release sequences be extended back to the

first release operation in a given thread out of a sequence of

release operations on a given object.

</p>

<p>This interpretation can be considered perverse in light of the

wording of 1.10p7, however, the suggested modification is consistent

with the intent.

</p>

</dd>

<dt>Proposal:</dt>

<dd>

<p>Replace 1.10p6 with the following:</p>

<blockquote>

        <p>A <i>release sequence</i>

        <ins>from a release operation <code>A</code></ins> on

        an atomic object <code>M</code> is a maximal contiguous

        sub-sequence of side effects in the modification order of

        <code>M</code>, where the first operation is

        <del>a release</del>

        <ins><code>A</code></ins>,

        and every subsequent operation

        </p>

        <ul>

        <li>        is performed by the same thread that performed

                <del>the release</del>

                <ins><code>A</code></ins>,

                or

<li>        is an atomic read-modify-write operation.

        </ul>

</blockquote>

<p>Please note that this has been modified slightly from that proposed

by Batty et al.

</p>

</dd>

<dt>Resolution:</dt>

<dd>Adopt the proposal.</dd>

</dl>

<h3>CA 13: Wording of the read-read coherence condition [Paul]</h3>

<dl>

<dt>Section:</dt>

<dd>1.10p13</dd>

<dt>Comment:</dt>

<dd>

<p>Batty et al. suggest that the following wording from 1.10p13:

</p>

<blockquote>

        <p>Furthermore, if a value computation <code>A</code> of an atomic

        object <code>M</code> happens before a value computation

        <code>B</code> of <code>M</code>, and the value computed by

        <code>A</code> corresponds to the value stored by side effect

        <code>X</code>, then the value computed by <code>B</code> shall

        either equal the value computed by <code>A</code>, or be the value

        stored by side effect <code>Y</code>, where <code>Y</code> follows

<code>X</code> in the modification order of <code>M</code>.

        </p>

</blockquote>

<p>be changed to the following:

</p>

<blockquote>

        <p>Furthermore, if a value computation <code>A</code> of an atomic

        object <code>M</code> happens before a value computation

        <code>B</code> of <code>M</code>, and <code>A</code> takes

        its value from the side effect <code>X</code>, then the value

        computed by <code>B</code> shall either be the value stored

        by <code>X</code>, or the value stored by a side effect

        <code>Y</code>, where <code>Y</code> follows <code>X</code>

in the modification order of <code>M</code>.

        </p>

</blockquote>

<dd>

<dt>Proposal:</dt>

<dd>

Use notation uniformly, as follows:

<blockquote>

        <p>Furthermore, if a value computation <code>A</code> of an atomic

        object <code>M</code> happens before a value computation

        <code>B</code> of <code>M</code>, and <del>the value computed by

        <code>A</code> corresponds to the value stored by</del>

        <ins><code>A</code> takes its value from</ins> side effect

        <code>X</code>, then <del>the value computed by</del>

        <code>B</code> shall

        <del>either equal the value computed by <code>A</code>,</del>

        <ins>take its value either from <code>X</code></ins>

        or <del>be

        the value stored by side effect</del> <ins>from

        <ins>a</ins> side effect</ins>

        <code>Y</code>,

        where <code>Y</code> follows

<code>X</code> in the modification order of <code>M</code>.

        </p>

</blockquote>

</dd>

<dt>Resolution:</dt>

<dd>Adopt the proposal.</dd>

</dl>

<h3>CA 14: Initialization of atomics</h3>

<dl>

<dt>Sections:</dt>

<dd>1.10p4</dd>

<dt>Comment:</dt>

<dd>

<p>Batty et al. suggest adding the following non-normative note to

1.10p4:

</p>

<blockquote>

        <p>[ Note: There may be non-atomic writes to atomic objects, for

        example on intialization and re-initialization. &mdash; end note ]

        </p>

</blockquote>

</dd>

</dl>

<h4>Discussion</h4>

<p>There was some dissatisfaction with this approach expressed in the

June email thread.

It is quite possible specifying this would be encroaching on the

prerogatives of implementors, who are in

any case free to perform operations non-atomically when permitted by

the as-if rule.

Implementors may also perform initializations atomically, again,

when permitted by the as-if rule.

</p>

<h4>Resolution</h4>

Adopt the update proposed by US 168.

</p>

<h3>CA 15: Intra-thread dependency-ordered-before [Paul]</h3>

<dl>

<dt>Section:</dt>

<dd>1.10p9</dd>

<dt>Comment:</dt>

<dd>

Batty et al. note that, unlike synchronizes-with,

the dependency-ordered before relation can operate within a thread.

This was not the intent.

Instead, intra-thread operations are covered by the rules applying

to execution of a single thread.

</dd>

<dt>Proposal:</dt>

<dd>

<p>Update 1.10p9 as follows:

</p>

<blockquote>

        <p>An evaluation <code>A</code> is dependency-ordered before an

        evaluation <code>B</code> if

        </p>

        <ul>

        <li>        <code>A</code> performs a release operation on an atomic

                object <code>M</code>, and <ins>on another thread,</ins>

                <code>B</code> performs a

                consume operation on <code>M</code> and reads a value

                written by any side effect in the release sequence headed

                by <code>A</code>, or

        <li>        for some evaluation <code>X</code>, <code>A</code> is

                dependency-ordered before <code>X</code> and

<code>X</code> carries a dependency to <code>B</code>.

        </ul>

</blockquote>

</dd>

<dt>Resolution:</dt>

<dd>Adopt the proposal.</dd>

</dl>

<h3>CA 22: Control Dependencies for Atomics [Paul]</h3>

<dl>

<dt>Sections:</dt>

<dd>N/A</dd>

<dt>Comment:</dt>

<dd>&ldquo;Control dependencies for atomics<br>

Given the examples of compilers interchanging data and control

dependencies, and that control dependencies are architecturally

respected on Power/ARM for load->store (and on Power for load->load with

a relatively cheap isync), we're not sure why carries-a-dependency-to

does not include control dependencies between atomics.&rdquo;

</dd>

<dt>Proposal:</dt>

<dd>Please clarify.</dd>

</dl>

<h4>Discussion</h4>

<p>At the time that the memory model was formulated, there was

considerable uncertainty as to what architectures respect control

dependencies, and to what extent.

It appears that this uncertainty is being cleared up, and our hope

is that it will be ripe for standardization in a later TR.

</p>

<h4>Resolution</h4>

<p>Not a Defect

</p>

<h3>US 10: Overlapping Atomics [Lawrence]</h3>

<dl>

<dt>Sections:</dt>

<dd>1.10/14</dd>

<dt>Comment:</dt>

<dd>The definition of a data race does not take into account

two overlapping atomic operations.</dd>

<dt>Proposal:</dt>

<dd>Augment the first sentence:

The execution of a program contains a data race if

it contains two conflicting actions in different

threads, at least one of which is not atomic (or

both are atomic and operate on overlapping, but

not-identical, memory locations), and neither

happens before the other.

</dd>

</dl>

<h4>Discussion</h4>

<p>The premise is incorrect;

atomic objects may not overlap.

The type argument to the <code>atomic</code> template

must be a trivially-copyable type (29.5.3/1)

and atomic objects are not trivially copyable.

The atomic types provide no means to obtain a reference to internal members;

all atomic operations are copy-in/copy-out.</p>

<h4>Resolution</h4>

<p>Not a Defect</p>

<h3>US 12: N3074 [Paul]</h3>

<dl>

<dt>Sections:</dt>

<dd>1.10p2, 1.10p14</dd>

<dt>Comment:</dt>

<dd>Adapt <a href="http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2010/n3074.html">N3074</a>.</dd>

<dt>Proposal:</dt>

<dd>The proposed change to 1.10p2 has been adopted.

However, the proposed change to 1.10p14 has not, so the following

modification needs to be made:

<blockquote>

        <p>

        The execution of a program contains a data race if it contains

        two conflicting actions in different threads, at least one of

        which is not atomic, and neither happens before the other. Any

        such data race results in undefined behavior. [ Note: It can be

        shown that programs that correctly use simple locks

        <ins>

        and <code>memory_order_seq_cst</code> operations

        </ins>

        to prevent all

        data races and

        <ins>

        that

        </ins>

        use no other synchronization operations behave as

        <del>

        the executions of

        </del>

        <ins>

        if the operations executed by

        </ins>

        their constituent threads

        <del>

        were

        </del>

        <ins>

        are

        </ins>

        simply

        interleaved, with each

        <del>

        observed value

        </del>

        <ins>

        value computation

        </ins>

        of an object being the

        <del>

        last value assigned

        </del>

        <ins>

        last side effect on that object

        </ins>

        in that interleaving. This is normally

        referred to as &ldquo;sequential consistency&rdquo;. However,

        this applies only to

        <ins>data&ndash;</ins>race&ndash;free programs, and

        <ins>data&ndash;</ins>race&ndash;free programs

        cannot observe most program transformations that do not change

        single&ndash;threaded program semantics.

        In fact, most single&ndash;threaded

        program transformations continue to be allowed, since any program

        that behaves differently as a result must perform an undefined

        operation. &mdash; end note ]

        </p>

</blockquote>

</dd>

<dt>Resolution:</dt>

<dd>Adopt the proposal.</dd>

</dl>

<h3>US 168, US 171: Initializing Atomics [Lawrence]</h3>

<dl>

<dt>Sections:</dt>

<dd>29.6/4</dd>

<dt>Comment:</dt>

<dd>The definition of the default constructor needs exposition.</dd>

<dt>Proposal:</dt>

<dd>Add a new paragraph:

A::A() = default; Effects:

Leaves the atomic object in an uninitialized state.

[Note: These semantics ensure compatiblity with

C. --end note]

</dd>

</dl>

<dl>

<dt>Number</dt>

<dd>US 171</dd>

<dt>Sections:</dt>

<dd>29.6/6</dd>

<dt>Comment:</dt>

<dd>The atomic_init definition "Non-atomically assigns the value" is not

quite correct, as the atomic_init purpose is intialization.</dd>

<dt>Proposal:</dt>

<dd>Change "Non-atomically assigns the value desired to *object."

with "Initializes *object with value desired".

Add the note: "[Note: This

function should only be applied to objects that have been default

constructed. These semantics ensure compatibility with C. --end note]"</dd>

</dl>

<h4>Discussion</h4>

<p>Adopt as recommended, but with more clarity.</p>

<h4>Resolution</h4>

<p>After 29.6 [atomics.types.operations] paragraph 4,

add a new function description as follows.</p>

<blockquote>

<dl>

<dt><code>A::A() = default;</code></dt>

<dd><i>Effects:</i>

Leaves the atomic object in an uninitialized state.

[<i>Note:</i> 

These semantics ensure compatiblity with C.

&mdash;<i>end note</i>]

</dl>

</blockquote>

<p>Edit 29.6 [atomics.types.operations] paragraph 7 as follows,

and then move it to just after the paragraph inserted above.</p>

<blockquote><p>

<p>

<i>Effects:</i>

<del>

Non-atomically assigns the value desired to <code>*object</code>.

</del>

<ins>

Initializes *object with value desired.

This function shall only be applied to objects

that have been default constructed,

and then only once.

[<i>Note:</i>

These semantics ensure compatibility with C.

&mdash;<i>end note</i>]

Initialization shall happen before (1.10)

other operations on the object.

[<i>Note:</i>

</ins>

Concurrent access from another thread,

even via an atomic operation,

constitutes a data race.

<ins>

&mdash;<i>end note</i>]

</ins>

</p>

</blockquote>

<h3>US 38: Generalized Infinite Loops [Clark]</h3>

<dl>

<dt>Sections:</dt>

<dd>1.10p16, 6.5p5</dd>

<dt>Comment:</dt>

<dd>

The statement that certain infinite loops may be assumed to

terminate should also apply to go-to loops and possibly infinite

recursion. We expect that compiler analyses that would take

advantage of this can often no longer identify the origin of

such a loop.

</dd>

<dt>Proposal:</dt>

<dd>

<p>Insert new paragraph following 1.10p16:</p>

<blockquote>

<p><ins>The implementation is allowed to assume that any

thread will eventually do one of the following:</ins></p>

<ul>

<li><ins>terminate,</ins></li>

<li><ins>make a call to a library I/O function,</ins></li>

<li><ins>access or modify a volatile object, or</ins></li>

<li><ins>perform a synchronization operation.</ins></li>

</ul>

<p><ins>[ <em>Note:</em>

This is

intended to allow compiler transformations,

such as removal of empty loops, even when termination cannot be proven.

&#8212; <em>end note</em> ]</ins></p>

</blockquote>

<p>Delete paragraph 6.5p5:</p>

<blockquote>

<p><del>A loop that, outside of the <var>for-init-statement</var>

in the case of a <code>for</code> statement,</del></p>

<ul>

<li><del>makes no calls to library I/O functions, and</del></li>

<li><del>does not access or modify volatile objects, and</del></li>

<li><del>performs no synchronization operations (1.10)

or atomic

operations (Clause 29)</del></li>

</ul>

<p><del>may be assumed by the implementation to

terminate. [ <em>Note:</em>

This is

intended to allow compiler transformations,

such as removal of empty loops, even when termination cannot be proven.

&#8212; <em>end note</em> ]

</del></p>

</blockquote>

</dd>

<dt>Resolution:</dt>

<dd>Adopt the proposal.</dd>

</dl>

<h3>GB 8, US 9, US 11: Mutexes versus Locks [Lawrence]</h3>

<dl>

<dt>Sections:</dt>

<dd>1.10/4,7</dd>

<dt>Comment:</dt>

<dd>The text says that the library "provides ... operations on

locks". It should say "operations on mutexes", since it is

the mutexes that provide the synchronization. A lock is

just an abstract concept (though the library types

unique_lock and lock_guard model ownership of locks)

and as such cannot have operations performed on it. This

mistake is carried through in the notes in that paragraph

and in 1.10p7.</dd>

<dt>Proposal:</dt>

<dd>

<p>Change 1.10p4 as follows:</p>

<blockquote>

<p>"The library defines a number of atomic operations (Clause 29) and

operations on mutexes (Clause 30) that are specially identified

as synchronization operations. These operations play a special

role in making assignments in one thread visible to another. A

synchronization operation on one or more memory locations is either

a consume operation, an acquire operation, a release operation, or

both an acquire and release operation. A synchronization operation

without an associated memory location is a fence and can be either

an acquire fence, a release fence, or both an acquire and release

fence. In addition, there are relaxed atomic operations, which are not

synchronization operations, and atomic read-modify-write operations,

which have special characteristics. [ Note: For example, a call

that acquires a lock on a mutex will perform an acquire operation

on the locations comprising the mutex. Correspondingly, a call that

releases the same lock will perform a release operation on those same

locations.  Informally, performing a release operation on A forces

prior side effects on other memory locations to become visible to

other threads that later perform a consume or an acquire operation on

A. "Relaxed" atomic operations are not synchronization operations even

though, like synchronization operations, they cannot contribute to

data races. -- end note ]"

</p>

</blockquote>

<p>Change 1.10p7 as follows:</p>

<blockquote>

<p>"Certain library calls synchronize with other library

calls performed by another thread. In particular,

an atomic operation A that performs a release

operation on an atomic object M synchronizes

with an atomic operation B that performs an

acquire operation on M and reads a value written

by any side effect in the release sequence headed

by A. [ Note: Except in the specified cases,

reading a later value does not necessarily ensure

visibility as described below. Such a requirement

would sometimes interfere with efficient

implementation. -- end note ] [ Note: The

specifications of the synchronization operations

define when one reads the value written by

another. For atomic objects, the definition is clear.

All operations on a given mutex occur in a single

total order. Each lock acquisition "reads the value

written" by the last lock release on the same

mutex. -- end note ]"

</p>

</blockquote>

</dd>

</dl>

<dl>

<dt>Number</dt>

<dd>US 9</dd>

<dt>Sections:</dt>

<dd>1.10/4</dd>

<dt>Comment:</dt>

<dd>The "operations on locks" do not provide synchronization,

as locks are defined in Clause 30.</dd>

<dt>Proposal:</dt>

<dd>Change "operations on locks" to "locking operations".

(Covered by GB 8.)

</dd>

</dl>

<dl>

<dt>Number</dt>

<dd>US 11</dd>

<dt>Sections:</dt>

<dd>1.10/7</dd>

<dt>Comment:</dt>

<dd>There is some confusion between locks and mutexes.</dd>

<dt>Proposal:</dt>

<dd>Change "lock" when used as a noun to "mutex".

(Covered by GB 8.)</dd>

</dl>

<h4>Discussion</h4>

<p>Adopt the wording of GB 8,

but with additional use of "mutex" for clarity.</p>

<h4>Resolution</h4>

<p>Edit 1.10 [intro.multithread] paragraph 4 as follows.</p>

<blockquote>

<p>The library defines a number of atomic operations (Clause 29)

and operations on mutexes (Clause 30)

that are specially identified as synchronization operations.

These operations play a special role

in making assignments in one thread visible to another.

A synchronization operation on one or more memory locations

is either a consume operation, an acquire operation, a release operation,

or both an acquire and release operation.

A synchronization operation without an associated memory location

is a fence and

can be either an acquire fence, a release fence,

or both an acquire and release fence.

In addition, there are relaxed atomic operations,

which are not synchronization operations,

and atomic read-modify-write operations,

which have special characteristics.

[<i>Note:</i>

For example, a call that acquires a lock <ins>on a mutex</ins>

will perform an acquire operation on the locations comprising the mutex.

Correspondingly, a call that releases the same lock

will perform a release operation on those same locations.

Informally, performing a release operation on <var>A</var>

forces prior side effects on other memory locations

to become visible to other threads

that later perform a consume or an acquire operation on <var>A</var>.

"Relaxed" atomic operations are not synchronization operations

even though, like synchronization operations,

they cannot contribute to data races.

&mdash;<i>end note</i>]

</p>

</blockquote>

<p>Edit 1.10 [intro.multithread] paragraph 7 as follows.</p>

<blockquote>

<p>Certain library calls <dfn>synchronize with</dfn>

other library calls performed by another thread.

In particular, an atomic operation <var>A</var>

that performs a release operation on an atomic object <var>M</var>

synchronizes with an atomic operation <var>B</var>

that performs an acquire operation on <var>M</var>

and reads a value written by any side effect

in the release sequence headed by <var>A</var>.

[<i>Note:</i>

Except in the specified cases,

reading a later value does not necessarily ensure visibility

as described below.

Such a requirement would sometimes interfere with efficient implementation.

&mdash;<i>end note</i>]

[<i>Note:</i>

The specifications of the synchronization operations

define when one reads the value written by another.

For atomic objects, the definition is clear.

All operations on a given <del>lock</del> <ins>mutex</ins>

occur in a single total order.

Each <ins>mutex</ins> lock acquisition "reads the value written"

by the last <ins>mutex</ins> lock release on the same mutex.

&mdash;<i>end note</i>]

</p>

</blockquote>

<h3>GB 15: Control Dependencies and Dependency Ordering [Paul]</h3>

<dl>

<dt>Number</dt>

<dd>GB 15</dd>

<dt>Sections:</dt>

<dd>N/A</dd>

<dt>Comment:</dt>

<dd>&ldquo;Given the examples of compilers interchanging data and control

dependencies, and that control dependencies are respected on Power/ARM

for load->store (and on Power for load->load with a relatively cheap

isync), we're not sure why carries-a-dependency-to does not include

control dependencies between atomics.&rdquo;

</dd>

</dl>

<h4>Discussion</h4>

<p>At the time that the memory model was formulated, there was

considerable uncertainty as to what architectures respect control

dependencies, and to what extent.

It appears that this uncertainty is being cleared up, and our hope

is that it will be ripe for standardization in a later TR.

</p>

<h4>Resolution</h4>

<p>Not a Defect

</p>

<h3>CH 2: Observable Behavior of Atomics [Lawrence]</h3>

<dl>

<dt>Sections:</dt>

<dd>1.9 and 1.10</dd>

<dt>Comment:</dt>

<dd>It's not clear whether relaxed atomic operations

are observable behaviour.</dd>

<dt>Proposal:</dt>

<dd>Clarify it.</dd>

</dl>

<h4>Discussion</h4>

Normatively, the behavior is well-defined.

We add a clarifying note.</p>

<h4>Resolution</h4>

<p>Edit paragraph 8 as follows.</p>

<blockquote>

<p>The least requirements on a conforming implementation are:</p>

<ul>

<li>Access to volatile objects

are evaluated strictly according to the rules of the abstract machine.

<ins>[Note: Atomic objects may also be either volatile or non-volatile.

&mdash;<i>end note</i>]</ins></li>

<li>At program termination,

all data written into files

shall be identical to one of the possible results

that execution of the program according to the abstract semantics

would have produced.</li>

<li>The input and output dynamics of interactive devices

shall take place in such a fashion that prompting output

is actually delivered before a program waits for input.

What constitutes an interactive device is implementation-defined.</li>

</ul>

<p>These collectively are referred to as

the observable behavior of the program.

[<i>Note:</i> more stringent correspondences

between abstract and actual semantics

may be defined by each implementation. &mdash;<i>end note</i>]</p>

</blockquote>

<h2>Controversial Issues Discussed in June Email Thread</h2>

<h3>CA 17: 1.10p12 phrasing</h3>

The last note of 1.10p12 refers to data races &ldquo;as defined here&rdquo;.

Batty et al. recommend that this change to &ldquo;as defined below&rdquo;.

Given that data races are defined in 1.10p14, it is easy to argue for

&ldquo;below&rdquo;, however, it is equally easy to argue that the scope

of &ldquo;here&rdquo; is the whole of 1.10.

</p>

<h2>TBD National-Body Comments</h2>

<p>Later revisions of this paper will also include the following

national-body comments:

</p>

<ol>

<li>        CA 18: &ldquo;Non-unique visible sequences of side effects and

        happens-before orderings&rdquo;.  TBD Benjamin Kosnik and Michael Wong.

<li>        CA 19: &ldquo;Alternative definition of the value read by an

        atomic operation&rdquo;.  TBD Benjamin Kosnik and Michael Wong.

<li>        CA 20: &ldquo;Reading from last element in a vsse?&rdquo;

        TBD Benjamin Kosnik and Michael Wong.

<li>        GB 11: covering relationship of <code>memory_order_consume</code>

        and modification order.

        (Item E in Appendix 1 of

        <a href="http://wiki.dinkumware.com/twiki/pub/Wg21rapperswil/Documents/N3102_FCD14882_SC22_BallotComments_All.pdf">ballot comments</a>.)

        TBD Benjamin Kosnik and Michael Wong.

<li>        GB 12: covering whether certain memory-ordering cycles are permitted.

        (Item F in Appendix 1 of

        <a href="http://wiki.dinkumware.com/twiki/pub/Wg21rapperswil/Documents/N3102_FCD14882_SC22_BallotComments_All.pdf">ballot comments</a>.)

        TBD Benjamin Kosnik and Michael Wong.

</ol>

<h2>Wording</h2>

<h3>1.10.p13</h3>

<p>National-body comments:

CA 11, CA 13, CA 18, CA 19, CA 20, GB 11, GB 12

</p>

<p>Changes:

</p>

<blockquote>

        <p>The visible sequence of side effects on an atomic object

        <var>M</var>, with respect to a value computation <var>B</var>

        of <var>M</var>, is a maximal contiguous sub-sequence of side

        effects in the modification order of <var>M</var>, where the

        first side effect is visible with respect to <var>B</var>, and

        for every <del>subsequent</del> side effect, it is not the case

that <var>B</var> happens before it.

        The value of an atomic object <var>M</var>, as determined

        by evaluation <var>B</var>, shall be the value stored by some

        operation in the a visible sequence of <var>M</var> with respect

to <var>B</var>.

        <ins>[<em>Note</em>: It can be shown that the visible sequence of

        side effects of a value computation is unique given the coherence

        requirements below.  &#8212; <em>end note</em>]</ins>

        <del>Furthermore, if a value computation <var>A</var> of an

        atomic object <var>M</var> happens before a value computation

        <var>B</var> of <var>M</var>, and the value computed by

        <var>A</var> corresponds to the value stored by side effect

        <var>X</var>, then the value computed by <var>B</var> shall

        either equal the value computed by <var>A</var>, or be the

        value stored by side effect <var>Y</var>, where <var>Y</var>

follows <var>X</var> in the modification order of <var>M</var>.

        [ <em>Note</em>: This effectively disallows compiler reordering

        of atomic operations to a single object, even if both operations

        are &ldquo;relaxed&rdquo; loads. This effectively makes the

        &ldquo;cache coherence&rdquo; guarantee provided by most hardware

        available to C++ atomic operations. &#8212; <em>end note</em>]

        [ <em>Note</em>: The visible sequence depends on the

        &ldquo;happens before&rdquo; relation, which depends on the values

        observed by loads of atomics, which we are restricting here. The

        intended reading is that there must exist an association of atomic

        loads with modifications they observe that, together with suitably

        chosen modification orders and the &ldquo;happens before&rdquo;

        relation derived as described above, satisfy the resulting

        constraints as imposed here. &#8212; <em>end note</em>]</del>

        </p>

</blockquote>

<h3>New Paragraphs Following 1.10.p13</h3>

<p>National-body comments:

CA 13, CA 18, CA 19, CA 20, GB 11, GB 12

</p>

<p>Changes:

</p>

<blockquote>

        <p><ins>If an operation <var>A</var> that modifies <var>M</var>

        happens before an operation <var>B</var> that modifies

        <var>M</var>, then <var>A</var> shall be earlier than <var>B</var>

in the modification order of <var>M</var>.

        [ <em>Note</em>: This requirement is knows as <em>write-write

        coherence.</em> &#8212; <em>end note</em>]</ins>

        </p>

</blockquote>

<blockquote>

        <p><ins>If a value computation <var>A</var> of an atomic object

        <var>M</var> happens before a value computation <var>B</var>

        of <var>M</var>, and <var>A</var> takes its value from the side

        effect <var>X</var>, then the value computed by <var>B</var>

        shall either be the value stored by <var>X</var>, or the value

        stored by a side effect <var>Y</var>, where <var>Y</var> follows

<var>X</var>  in the modification order of <var>M</var>.

        [ <em>Note</em>: This requirement is knows as <em>read-read

        coherence.</em> &#8212; <em>end note</em>]</ins>

        </p>

</blockquote>

<blockquote>

        <p><ins>If a value computation <var>A</var> of an atomic

        object <var>M</var> happens before an operation <var>B</var>

        that modifies <var>M</var> then <var>A</var> shall either take

        its value from some side effect <var>X</var>, where <var>X</var>

        precedes <var>B</var> in the modification order of <var>M</var>,

or shall take its value from the initial value of <var>M</var>.

        [ <em>Note</em>: This requirement is knows as <em>read-write

        coherence.</em> &#8212; <em>end note</em>]</ins>

        </p>

</blockquote>

<blockquote>

        <p><ins>If a side effect <var>X</var> that stores a value to an

        atomic object <var>M</var> happens-before a value computation

        <var>B</var> of <var>M</var>, then the evaluation <var>B</var>

        shall take its value from <var>X</var> or from a side effect

        <var>Y</var> that follows <var>X</var> in the modification order

of <var>M</var>.

        [ <em>Note</em>: This requirement is knows as <em>write-read

        coherence.</em> &#8212; <em>end note</em>]</ins>

        </p>

</blockquote>

<blockquote>

        <p><ins>[ <em>Note</em>: 

        These four coherence requirements effectively disallow compiler

        reordering of atomic operations to a single object, even if

        both operations are &ldquo;relaxed&rdquo; loads. This effectively makes

        the &ldquo;cache coherence&rdquo; guarantee provided by most hardware

        available to C++ atomic operations.

        &#8212; <em>end note</em>]</ins>

        </p>

</blockquote>

<blockquote>

        <p><ins>[ <em>Note</em>: 

        The visible sequence depends on the &ldquo;happens before&rdquo; relation,

        which depends on the values observed by loads of atomics, which

        we are restricting here. The intended reading of these four

        coherence requirements is that there must

        exist an association of atomic loads with modifications they

        observe that, together with suitably chosen modification orders

        and the &ldquo;happens before&rdquo; relation derived as described above,

        satisfy the resulting constraints as imposed here.

        &#8212; <em>end note</em>]</ins>

        </p>

</blockquote>

</body></html>