[cpp-threads] RE: Initial comments on your straw man

Sun Feb 19 16:46:32 GMT 2006

------- Forwarded Message

Return-Path: hans.boehm at hp.com
Delivery-Date: Sun Feb 19 00:25:29 2006
Return-path: <hans.boehm at hp.com>
Envelope-to: nmm1 at cus.cam.ac.uk
Delivery-date: Sun, 19 Feb 2006 00:25:29 +0000
Received: from ppsw-1.csi.cam.ac.uk ([131.111.8.131])
	by virgo.cus.cam.ac.uk with esmtp (Exim 4.60)
	(envelope-from <hans.boehm at hp.com>)
	id 1FAcOG-0003wL-VH
	for nmm1 at cus.cam.ac.uk; Sun, 19 Feb 2006 00:25:28 +0000
X-Cam-SpamDetails: scanned, SpamAssassin (score=0)
X-Cam-AntiVirus: No virus found
X-Cam-ScannerInfo: http://www.cam.ac.uk/cs/email/scanner/
Received: from palrel10.hp.com ([156.153.255.245]:54087)
	by ppsw-1.csi.cam.ac.uk (mx.cam.ac.uk [131.111.8.141]:25)
	with esmtp (csa=unknown) id 1FAcO5-0000fF-4v (Exim 4.54) for nmm1 at cus.cam.ac.uk
	(return-path <hans.boehm at hp.com>); Sun, 19 Feb 2006 00:25:17 +0000
Received: from cacexg12.americas.cpqcorp.net (cacexg12.americas.cpqcorp.net [16.92.1.72])
	by palrel10.hp.com (Postfix) with ESMTP id 45C4B35336
	for <nmm1 at cus.cam.ac.uk>; Sat, 18 Feb 2006 16:25:16 -0800 (PST)
Received: from cacexc12.americas.cpqcorp.net ([16.92.1.78]) by cacexg12.americas.cpqcorp.net with Microsoft SMTPSVC(6.0.3790.1830);
	 Sat, 18 Feb 2006 16:25:15 -0800
X-MimeOLE: Produced By Microsoft Exchange V6.5.7226.0
Content-class: urn:content-classes:message
MIME-Version: 1.0
Content-Type: text/plain;
	charset="US-ASCII"
Content-Transfer-Encoding: quoted-printable
Subject: RE: Initial comments on your straw man
Date: Sat, 18 Feb 2006 16:25:15 -0800
Message-ID: <65953E8166311641A685BDF71D8658266C1C22 at cacexc12.americas.cpqcorp.net>
X-MS-Has-Attach: 
X-MS-TNEF-Correlator: 
Thread-Topic: Initial comments on your straw man
Thread-Index: AcYwJWiDRPi8vWrPQpmO3p1moqp1gAEvMOcg
From: "Boehm, Hans" <hans.boehm at hp.com>
To: "Nick Maclaren" <nmm1 at cus.cam.ac.uk>
Cc: <clark.nelson at intel.com>, "Boehm, Hans" <hans.boehm at hp.com>
X-OriginalArrivalTime: 19 Feb 2006 00:25:15.0916 (UTC) FILETIME=[F4B56CC0:01C634EA]

Metacomment -

If you don't mind, I think it would be better to carry on this
discussion on the mailing list.  It does mean that all our silly
mistakes are archived for eternity (or at least a long time).  On the
other hand, the discussion can be very useful when trying to remember
rationales later.  And most silly mistakes unfortunately tend to be made
more than once.

I took the liberty of copying Clark Nelson, since some of this impacts
what he is doing directly.  If you don't mind, I have no objection to
forwarding all of this back to the list.

> -----Original Message-----
> From: Nick Maclaren [mailto:nmm1 at cus.cam.ac.uk]=20
> Sent: Sunday, February 12, 2006 2:40 PM
> To: Boehm, Hans
> Subject: Initial comments on your straw man
>=20
> Here are a few comments.
>=20
> Data Races and our Approach to Memory Consistency
> -------------------------------------------------
>=20
> I really DON'T like specifying that structure/array accesses=20
> are accesses to the individual elements in some order.  In=20
> the past, implementations have performed stores using=20
> fast-path code, trapped failures, and repeated using safe=20
> code.  That isn't unreasonable, and is seriously handicapped=20
> by that specification.
>=20
> Incidentally, both of the POWER and IA64 architecture do=20
> that, and my reading of those is that it is unspecified=20
> whether the software will make it transparent to the=20
> application.  There is comparable murk on other ones as well,=20
> including the x86!  Consider storing an array of 80-bit x86=20
> FP registers into an array of 64-bit IEEE locations.  A=20
> reasonable implementation is to do it fast, trap overflow and=20
> repeat the whole array assignment, slowly, if there was one. =20
> Many vector processors work like that.
I don't yet understand why you think this is observable to the
programmer.  If the accesses to the individual elements are not ordered,
I think the implementation is still free to do everything you suggest.
Certainly that was my intention.

As far as specifying races is concerned, I think this identifies exactly
the same races as other possible approaches.  And I think that's really
what matters here.
>=20
> The is-sequenced-before relation
> --------------------------------
>=20
> Er, no, sorry.  Sequence points are far more messed up than THAT.
> Consider f(g(),h()) or g()+h() and ask about the ordering of=20
> the function executions g() and h().  The consensus is that=20
> there is a sequence point between them, but there is no=20
> ordering relation.
Agreed.  Thanks.  I added a note for now.  Clark Nelson is also working
on a paper addressing the sequence point issues.  I'd be very happy if
he got this at least mostly straightened out.

I actually no longer understand what the standard means here.  Assume i
is a local variable, and f stores through the pointer passed to it.

Which one of the following is legal?

(i =3D 13) + (i =3D 14)	-- no
(i =3D 13) + f(&i)	-- not a clue
f(&i) + f(&i)	-- yes, otherwise lots of stuff is broken

Probably the middle one should be undefined, but I have no idea how the
standard implies that without also outlawing the last one.

>=20
> The communicates-with relation
> ------------------------------
>=20
> Hmm.  Does this include I/O to external agents, signaling and so on?
> Those are a real minefields, as they involve specifying the=20
> behaviour of external entities, some of which may use=20
> concepts that are not even describable in C++.
That was the intent, at least to the extent that we want those to imply
memory visibility.  I think the standard should confine itself to
setting up the terminology, and defining the relation for routines in
the standard library.  Other standards will have to address it for other
libraries.  With luck, they'll borrow the terminology, so the result
makes sense.
>=20
> Data races
> ----------
>=20
> Paragraph 1, "and at least one of them is a store access". =20
> Picky, aren't I?
Thanks.  Fixed.
>=20
> Later, under the Java model, you mention I/O again.  There=20
> are many reasons that I/O should NOT be included in the=20
> normal total order, of which the above is one.  Another, and=20
> much stronger, one is that is prevents even simple buffering=20
> - let alone asynchronous I/O.
> One needs a different set of definitions.
Remember that we effectively have a total order on ordinary memory
operations only for defining when there is a data race.  If there is no
data race, the ordering doesn't matter, because it's not observable.  If
there is a data race, the ordering doesn't matter because the semantics
are undefined.

However, I agree that this comment needs to be revisited and integrated
into the preceding text.
>=20
> As far as unions and threads go, did you see my example that=20
> showed that the C++ standard is flatly inconsistent with=20
> itself?  That simple wording won't fly.
I don't immediately remember it.  I need to look back for it.
>=20
> Member and Bitfield Assignments
> -------------------------------
>=20
> The last paragraph assumes a particular interpretation of the=20
> standard.
> With others (equally reasonable), I can produce just such examples.
> In fact, some of the earlier ISO C compilers were changed to=20
> NOT write padding characters, because they broke so many "working"
> programs.
I'm not sure I understand what you're getting at here.

>=20
> Volatile variables and data members
> -----------------------------------
>=20
> If you know the semantics of volatile that current=20
> implementations use, please do tell!  In most cases, I am=20
> sure that even the vendors don't know - they do what they do.
I agree with you.  The HP/UX IA64 compiler actually gives you options to
control fairly precisely what it should mean.  The Intel compiler also
gives you a couple of different ones.  But in earlier discussions there
seemed to be possibly a very weak consensus that it should provide
guarantees corresponding to a weak interpretation of the standard, i.e.
it should be OK to use volatiles to preserve variables across setjmp,
and to protect against unexpected mmap-based aliasing.  And for
performance reasons, it should do little else.

I'm not sure I read that correctly, and it may be the wrong thing to do.

People writing clever lock-free algorithms seem to sometimes like this
interpretation, because they can then manually insert
(machine-dependent) memory fences, and guarantee that volatile
declarations slow down the code only as much as necessary.  That
probably isn't a good argument, since it doesn't help for portable code.
>=20
> I can accept requiring that structure and array assignments=20
> of solely __async volatile data are handled as individual=20
> ones, as such things SHOULD be used only for a thread-indexed=20
> array of synchronisation objects and similar.
>=20
>=20
> Regards,
> Nick Maclaren,
> University of Cambridge Computing Service, New Museums Site,=20
> Pembroke Street, Cambridge CB2 3QH, England.
> Email:  nmm1 at cam.ac.uk
> Tel.:  +44 1223 334761    Fax:  +44 1223 334679
>=20

------- End of Forwarded Message