Document of relevance
Boehm, Hans
hans.boehm at hp.com
Mon Jan 24 23:37:15 GMT 2005
Andrei -
Is this really relevant for us?
Posix talks about "memory locations" which must not be concurrently written.
I think that interpreting "memory location" to mean "object" would be
a disaster anyway. You couldn't protect different fields in the same
object with different locks. So I don't understand why we care what
"object" means.
(I do agree that this describes a serious issue. But unless I'm missing
something, I think we can let others worry about it.)
Hans
> -----Original Message-----
> From: Andrei Alexandrescu [mailto:andrei at metalanguage.com]
> Sent: Sunday, January 23, 2005 2:16 AM
> To: dl at cs.oswego.edu
> Cc: Boehm, Hans; Maged Michael; Ben Hutchings; Doug Lea;
> Kevlin Henney;
> Peter A. Buhr; pugh at cs.umd.edu
> Subject: Document of relevance
>
>
> In spite of the thundering silence in response to my last email that
> was trying to get everybody enthused and active, I paste here a
> relevant document (call it essay, diatribe, or as you wish) forwarded
> to me by a C standardization committee member.
>
>
> Andrei
>
> --------------------
>
> In case you haven't seen it, I append a diatribe on C objects. It
> doesn't provide any answers, but does describe a lot of places where
> the C standard is unclear on exactly what an object is, and when two
> objects may overlap. This aspect also needs addressing to solve the
> parallelism problems - though I agree that the points you are looking
> at are also critical.
>
>
> Regards,
> Nick Maclaren,
> University of Cambridge Computing Service,
> New Museums Site, Pembroke Street, Cambridge CB2 3QH, England.
> Email: nmm1 at cam.ac.uk
> Tel.: +44 1223 334761 Fax: +44 1223 334679
>
>
>
>
>
> This is a slight redraft of a documented that was circulated to the UK
> C panel. I am not sure what it is proposing, except that effective
> types need an improved specification, but it attempts to describe a
> very serious problem.
>
>
> THE OBJECT PROBLEM
> ------------------
>
> This is asking the question "What is an object when it is at
> home?" We
> need to start with an explanation of why a precise answer to the
> question is so important.
>
> This issue is very closely related to the sequence point issue,
> obviously, and can be regarded as of equal importance as far as the C
> language's semantics are concerned. However, it is even more
> serious as
> far as its effects on other standards is. C and POSIX have now agreed
> that their approaches to signal handling are incompatible,
> though not in
> those words. This needs to be resolved, but will not even be tackled
> for POSIX 2001, as it is far too difficult a problem.
>
> The actual incompatibility is that C defines all parallelism
> and 'real'
> or external signal handling to be undefined behaviour, as it
> is outside
> C's model, and that POSIX leaves all language issues to C, as it is
> mainly a library interface. Therefore such things may be well-defined
> in POSIX terms, but have undefined effects on the program
> which uses the
> POSIX facilities! The incompatibility is therefore of the form that
> POSIX relies on C to define behaviour that is explicitly undefined!
> Obviously, the solution is for POSIX to define the language
> aspects that
> it needs, but the problem is that doing so is very tricky.
>
> It was pretty complex in C90, but C99 has complicated this area very
> considerably, and the problems are going to cause major
> difficulty over
> the next decade. The incompatibility between C90 and POSIX already
> causes intermittent, unrepeatable program failure on most of
> the systems
> that I have access to but, precisely because of those
> properties, it is
> almost impossible to prove this to the vendors. Few people recognise
> the symptoms and a negligible proportion have the skills to
> investigate
> further.
>
> When I am referring to the SMP aspects, I am talking about
> the fact that
> two independent threads must not access overlapping objects if one of
> them is updated. Whether the SMP is cache-coherent is irrelevant, as
> exactly the same problem occurs with data in registers as with data in
> memory; when using parallelism for efficiency, storing and
> reloading all
> registers at every synchronisation point is an unacceptable overhead,
> and so is not done. Also, note that this applies to all
> state (such as
> floating-point exception flags) as well as data. There is an exactly
> corresponding problem with asynchronous signal handlers.
>
>
> Commercial Implementors and Important Users
> -------------------------------------------
>
> The vast majority of commercial implementors and important
> users are not
> members of SC22/WG14 or even any national body affiliated with it, and
> work solely from the standard. As they should, because ISO rules (and
> all normal commercial practice) specify that the standard is the
> official document, and records of its development should be
> used for at
> most clarification.
>
> I can witness that this area has caused major headaches ever since C90
> was developed, and that many implementors are very seriously unhappy
> about C99. They simply do not know what most of it implies and, as is
> generally agreed, the subtleties in this area are critical for both
> optimisation and the development of robust applications. Most of the
> ones that I have contacts with had severe problems deciding exactly
> what optimisations were allowed in C90, and often went through several
> iterations before and industry consensus emerged.
>
> Their approach is typically to use restrict and static in array
> parameters (but not effective types) enough to optimise the BLAS and
> LAPACK, and perhaps a little further, but to await
> clarification before
> being more radical. The fact that a very few vendors are
> gung ho about
> effective types, without it being absolutely clear what the
> constraints
> mean, is going to be a major portability problem in certain complex
> codes. I already see far too many that have to disable
> optimisation on
> some systems because it is unclear whether they or the implementation
> has broken the standard.
>
>
> The C99 Standard
> ----------------
>
> Let us look at some relevant definitions:
>
> [A] access
> <execution-time action> to read or modify the value of an object
>
> [B] alignment
> requirement that objects of a particular type be located
> on storage
> boundaries with addresses that are particular multiples of a byte
> address
>
> [C] object
> region of data storage in the execution environment, the
> contents of
> which can represent values
>
> NOTE: When referenced, an object may be interpreted as having a
> particular type; see 6.3.2.1.
>
> [D] parameter, formal parameter, formal argument (deprecated)
> object declared as part of a function declaration or
> definition that
> acquires a value on entry to the function, or an
> identifier from the
> comma-separated list bounded by the parentheses immediately
> following the macro name in a function-like macro definition
>
> [E] value
> precise meaning of the contents of an object when interpreted as
> having a specific type
>
> Other sections that are particularly relevant include 5.1.2.3 passim
> (side effects, sequence points and conformance requirements),
> 6.2.5 [#1]
> (object type and meaning of a value), 6.2.5 [#20] (derived types),
> 6.2.6.1 [#2,#4] (values are stored in bytes), 6.2.6.1 [#5,#6,#7] (trap
> representations and padding bytes), 6.2.6.1 [#8] (validity of
> alternate
> representations), 6.2.6.2 (integer types), 6.3.2.1 [#1] (definition of
> lvalue), 6.2.7 [#2] (multiple declarations and compatible types),
> 6.3.2.1 [#2] (lvalue to value conversion), 6.3.2.1 [#3] (array to
> pointer conversion), 6.3.2.3 [#1,#7] (pointer conversion), 6.5 [#1]
> (sequence points and accesses), 6.5 [#6] (effective type), 6.5 [#7]
> (permitted access types), 6.5 passim (expressions that require or
> deliver lvalues), 6.5.3.2 [#3] (unary '&' operator), 6.5.8 [#5,#6]
> (pointer comparison), 6.7.3.1 (formal definition of restrict), 6.7.5.3
> [#7] (array parameter adjustment and implication of static therein),
> 6.9.1 [#9] (parameter identifiers are lvalues) and 7.21.1 [#1]
> (<string.h> access conventions). I may well have missed some
> important
> ones.
>
> What is abundantly clear is that the basic model used by C is
> this. An
> object type is specified to be a complete type that is
> neither void nor
> a function type. An expression that can access an object has a
> one-to-one mapping to an lvalue, and they correspond to a unique and
> well-defined object and object type. A pointer value with a
> pointer to
> object type defines an lvalue that is an array object with a base type
> the corresponding object type (the array may be of length 1,
> of course),
> except for the odd case of the element one beyond the end of an array.
> A pointer value with a pointer to an incomplete type or to the element
> beyond the end of an array defines an address but not an
> lvalue, though
> its base type correspondence is otherwise the same as that
> for a pointer
> to an object. A pointer value with a pointer to void type specifies
> only an address. So far, so good.
>
> Unfortunately, this breaks down in several places in C90, and in many
> more in C99. In most cases, the failure of the basic model is not
> serious in the effects on the C language as such, because the
> constructions that show up the problems are either perverse or show up
> only in the most extreme optimising compilers. So it is
> mainly the High
> Performance Computing people who notice, and few of them. However, it
> is very serious as far as the effects on related standards are
> concerned; this applies particularly to parallel models, such as POSIX
> threads or OpenMP, but also includes anything that needs
> reliable signal
> handling.
>
> To a first approximation, the major problems separate into two parts:
> what the base type of an object is and, if an object is an array type,
> what size the array is. These can be considered separately.
>
> The first question is less clear-cut than the second, and has several
> aspects, which all interact with one another. These include
> whether an
> object or subobject is the relevant one, which arises in
> several guises,
> and the new concept of effective type.
>
> There is a further subtlety, which will not be considered here in any
> detail, which is when a datum with one base type can be
> accessed through
> an lvalue of a different type. This was extremely unclear in C90 and
> has been significantly changed in C99. In practice, this is mostly
> about the signedness of integer types, because all other base
> types tend
> to be either wholly incompatible or wholly compatible. The issue
> applies as much to them; it is merely that the problem does
> not show up
> in the current C standard. However, it is worth noting that it will
> probably be exposed for fixed-point data types as well, and in a worse
> way than for signedness, if they are introduced.
>
> It is also critical to realise that the examples given later
> are a small
> proportion of the problems that I have seen actually cause
> trouble, and
> a very small proportion of those where I know that the C99
> and POSIX or
> OpenMP standards' models conflict. So there is virtually no point in
> trying to resolve any of the following individual examples; what is
> needed is a clarification of these aspects of the C standard,
> along the
> lines of the sequence point approaches (only probably rather more
> radical).
>
>
> What is the Base Type?
> ----------------------
>
> In this context, type qualifiers are largely irrelevant, and the most
> critical properties of the base type are its size and alignment.
> While the C standard does not say so, it is clear that aliasing,
> sequence point and SMP problems are primarily affected by those two
> properties alone.
>
> In C90, things were relatively clear, and it had three categories of
> type, all of which have been preserved in C99:
>
> Datum type. This is the actual type of the datum, ignoring any
> syntactic issues, and is generally not decidable statically.
>
> Access type. This is the type of the lvalue used for access, and
> can be decided statically.
>
> Allocation type. This is the type of the original definition or
> result of malloc etc., and is generally not decidable statically.
>
> The requirements in both C90 and C99 are that the datum and
> access types
> must be compatible, and that the size and alignment of the allocation
> type must be adequate for both. There is also the exception for
> character access types, which may be used on any datum type.
> There was
> a thread on the reflector which claimed that the allocation type could
> affect later accesses which used another type of lvalue, but
> it did not
> seem to have much support, and the consensus was that its only lasting
> effect was through its alignment. All of this specification could do
> with clarification.
>
> In C90, the base type of an object was the base type of the lvalue
> (i.e. the access type), except for certain library functions
> (e.g. memcpy) which accessed objects as arrays of bytes. While there
> were a good many cases where the array length was unclear
> (see below), I
> cannot think of many where the alignment of the base type was. What
> confusion there was mainly which of a set of compatibly aligned base
> types to choose from when using functions like memcpy.
>
> There were also some confusions about whether the type of an argument
> (as distinct from a parameter) was relevant, subtleties about
> signedness, and whether copying only the data of a structure (i.e. not
> its padding) was safe. Some of them have been clarified in C99, but
> others have not. These will be ignored here, as they affect the
> semantics of C programs, but have relatively little effect on other
> standards; i.e. they affect what can be done with an object,
> rather than
> exactly where it exists in memory.
>
> However, C99 has changed this area quite drastically, by
> introducing the
> concept of effective type. This has not clarified the situation, and
> some of its problems are described below. But we now have to add:
>
> Effective type. This seems to be an extension of datum type, but
> intended for type-dependent optimisation.
>
> I shall not describe this in the next section, as I do not understand
> it, for reasons given later. I think that its introduction was a very
> serious mistake.
>
>
> Objects and Subobjects
> ----------------------
>
> There are three entwined aspects to this. One is the question of when
> an object access refers to a subobject (as distinct from the whole
> object), such as an array access referring to an array element or a
> structure or union access referring to a member. Another is when a
> valid object can be created by pure pointer manipulation,
> bypassing the
> type mechanism entirely. And the last is when the relevant
> object is an
> array of uninterpreted characters (i.e. unsigned char); while this is
> most common in the library, it can occur in the language proper.
>
> One of the reasons that C90 and C99 work at all is that they define
> almost no type-generic facilities other than ones that map objects to
> arrays of characters. <tgmath.h> is irrelevant here, because it does
> not raise the problems being considered. However, the same
> is not true
> in other standards that use C as a base, because they may
> (for example)
> have facilities that operate on arrays of generic data pointers. But
> this does not mean that there is not an ambiguity in C, so
> much as that
> the ambiguity is not exposed in the current language.
>
> It appears that the standard says neither that a member of a struct is
> itself an object, nor that an element of an array is, but clearly they
> are. However, there is more to subsectioning than that. Consider:
>
> #include <stddef.h>
>
> typedef struct {int a; double b; int c;} COMPOSITE;
> typedef struct {int a; double b;} SUBSET;
>
> int main (void) {
> double array[5] = {0.0};
> COMPOSITE composite = {0,0.0,0};
> double (*p)[2];
> SUBSET *q;
> int *r;
>
> array[0] += (array[1] = 0.0);
> composite.a += (composite.c = 0);
>
> p = (double (*)[2])(array+2);
> q = (SUBSET *)&composite;
> r = (int *)(((char *)&composite)+offsetof(COMPOSITE,c));
>
> return 0;
> }
>
> It is clear that the first two assignments would be illegal if the
> relevant objects were the whole array and structure, and constructions
> like that are so common that the only plausible
> interpretation of the C
> standard is that the relevant objects are the elements and members.
> This should be stated explicitly.
>
> However, I assert that it is also the case that *p, *q and *r are
> well-defined objects of type double[2], SUBSET and int. The
> validity of
> the first follows from the equivalence of pointers and arrays and the
> fact that a pointer to an array is a pointer to its first element, the
> second follows from the common initial sequence rules for unions, and
> the third from the very definition and purpose of the offsetof macro.
>
> The constructions given here may not be common, but there are a vast
> number of programs that use equivalent ones (often implicitly), and
> forbidding them would break a huge number of programs. In
> all cases, if
> they are regarded as undefined, I am almost certain that I
> can produce a
> program that shows a clear inconsistency in the standard.
>
> Perhaps worse is the fact that there are so many unrelated ways of
> bypassing the formal type structure; I do not know offhand
> how many more
> there are. So subsetting is not a simple phenomenon.
>
> There is a similar problem with the cases where objects are handled as
> arrays of characters. Consider:
>
> #include <string.h>
>
> typedef struct {int a; double b; int c;} COMPOSITE;
>
> int main (void) {
> COMPOSITE composite;
> char string[] = "ABB";
>
> memcpy(&composite,&((char
> *)&composite)[sizeof(int)],sizeof(int));
>
> strtok(string,&string[2]);
>
> return 0;
> }
>
> The first function call is a fairly common construction, though rarely
> in that form, and generally regarded as acceptable; again,
> forbidding it
> would break a huge number of programs and could be used to show an
> inconsistency in the standard. The second one is obscene,
> but I can see
> no reason that it is not conforming; the consequences of
> forbidding such
> things generically are beyond my imagination.
>
> My view is that these questions are fairly easy to resolve
> within the C
> language, except for the matter of effective type discussed below.
> However, it is not so clear when other standards are
> involved, or for C
> extensions that allow operations on whole arrays. For example, it is
> very unclear whether an array should be treated as a single object, as
> an array of discrete elements or as an array of characters. I doubt
> that this problem is soluble without a major rethink of the C object
> model. Hence my view is the following:
>
> The generic C model is that the object used for access
> through an lvalue
> is the smallest possible subobject compatible with the type of the
> lvalue and form of access, and that arrays (including arrays of
> uninterpreted characters) are invariably treated as discrete elements
> even when operated on in bulk. I can think of no exceptions to this,
> but the point is not clearly stated and other interpretations may be
> possible. Some wording improvement is essential.
>
> Related to this is the interpretation of the base type of an
> object: if
> the lvalue used for access has an object type, that type and the datum
> type are the only ones considered; if it has none (as with
> memcpy), the
> lvalue type is treated as if it were an array of uninterpreted
> characters and the datum type is irrelevant. The main
> exception is the
> concept of effective type, if it is retained.
>
> The third aspect is that a valid object can be created by any legal
> pointer operations, casts etc., subject to each conversion having an
> appropriate alignment for its type and value, and the result ending up
> with an appropriate datum type and size. I would hate to
> have to draft
> the specification of this, but it badly needs doing. The current
> wording is probably adequate for the alignment, but is very unclear on
> when the result is a valid object.
>
> The C standard should make a committment (in Future
> Directions) that it
> will not be defining operators or other facilities that
> access arrays as
> a whole without resolving this problem. Without such a committment,
> many implementors will be reluctant to trust the existing standard to
> remain stable, as a trivial change in this area could have immense
> consequences for implementors.
>
> It should also point out to the developers of extensions or other
> standards based on C that this area is a minefield. In particular, it
> should say that it is essential that the extension or
> standard specifies
> whether arrays will be treated as single objects, as their composite
> elements, or as arrays of bytes in any particular context, as the C
> language allows any of those options.
>
>
> The restrict Qualifier
> ----------------------
>
> This issue was raised on the reflector with the previous specification
> of restrict, but got dropped because it was not clearly applicable to
> the new one. It is considering the question of whether the objects
> affected by the restrict qualifier are considered to have
> type the base
> type of the restrict qualifier pointer or that of the lvalue used for
> the actual access.
>
> typedef struct {int a; double b; int c;} COMPOSITE;
> COMPOSITE composite;
>
> void copy (COMPOSITE * restrict A, COMPOSITE * restrict
> B) {A.a =
> B.c;}
>
> void scatter (double * restrict A, double * restrict B) {
> int i;
> for (i = 0; i < 100; i += 2) A[i] = B[i] = 0.0;
> }
>
> int main (void) {
> double array[200];
> copy(&composite,&composite);
> scatter(array,&array[1]);
> return 0;
> }
>
> Now, if the objects being referred to in 6.7.3.1 are the
> access (lvalue)
> types, there is nothing wrong with these; but, if they are considered
> relative to the base type of the restrict qualified pointers, both
> function calls are undefined. As I understand the wording, the former
> is the correct interpretation, though it is not the best specification
> for SMP optimisation.
>
> My belief is that this does not need attention, if the
> previous queries
> are resolved and clarified in the directions described. If, however,
> they are resolved in any other way, then it needs reconsidering in the
> light of that resolution.
>
>
> Effective Type
> --------------
>
> C99 introduced the concept of effective type (6.5 paragraph 6), but it
> has had the effect of making a confusing situation totally baffling.
> This is because it has introduced a new category of types, it has
> invented new terminology without defining it, its precise
> intent is most
> unclear, and it has not specified its effect on the library.
> The first
> aspect was mentioned above.
>
> 6.5 paragraph 6 uses the term "declared type" of an object, which is
> otherwise used in only five other places, and then in contexts that
> makes its meaning clear. In all of those cases, the context is
> discussing an aspect of a particular declaration or lvalue,
> and the term
> is a reference back to that declaration. But that is not the case in
> this section, and it is clear that the term is being used with a
> different meaning. It clearly and repeatedly distinguishes
> the type of
> the lvalue used for access from the declared type, so it
> cannot be that,
> but what does it mean?
>
> The third question is related, in the sense that knowing what
> that term
> means in this context might enable one to deduce the intent of this
> section, and knowing the intent of this section might enable one to
> deduce the meaning of that term.
>
> Consider:
>
> #include <stdlib.h>
>
> typedef struct {double a;} A;
> typedef struct {A b;} B;
>
> void fred (A *a) {
> double *p = (double *)a;
> void *q = malloc(sizeof(B));
> memcpy(q,(char *)p,sizeof(B));
> }
>
> int main (void) {
> B b[1000];
> fred((A *)b);
> return 0;
> }
>
> After the call to memcpy, is the effective type of the object
> pointed to
> by q an array of B, A, double or char? In other words, which
> of those is
> the "declared type" and why?
>
> It is probable that it is not an array of char, because of the lvalue
> versus declared type distinction. But there is no obvious reason to
> choose any one of B, A or double as the preferred interpretation.
>
> This is not the only problem with this section. Another very
> nasty one
> relates to partial copying, which is extremely common in many C
> programs. Consider:
>
> #include <stddef.h>
> #include <stdlib.h>
>
> typedef struct {double a; int b;} C;
>
> int main (void) {
> C c = {0.0,0};
> void *p = malloc(sizeof(int));
> memcpy(p,(void *)(((char *)&c)+offsetof(C,b)),sizeof(int));
> return 0;
> }
>
> The wording of 6.5 paragraph 6 would seem to imply that the effective
> type of the object pointed to by p becomes C, which means that the
> object is necessarily invalid as there is insufficient space for a C
> datum. But surely it cannot be the intent that the effective type of
> the object pointed to by p is incompatible with its datum
> type, which is
> most definitely int in this case? If it is, this is the most serious
> incompatibility between C99 and C90 yet reported, and will
> break a huge
> number of reasonable programs.
>
> For the fourth aspect, consider:
>
> #include <stdlib.h>
>
> void *compare (const void *A, const void *A) {
> return *(double*)A-*(double *)B;
> }
>
> int main (void) {
> double array[100] = {0.0};
> void *p;
>
> p = bsearch(array[0],array,100,sizeof(double),compare);
> }
>
> What is the effective type of the object pointed to by p? It
> is pretty
> clearly double, an array of double or no effective type. But
> which and
> why?
>
> There are probably other problems, but there is little point in giving
> yet more examples. What is clear is that I (and some other people I
> have spoken to) cannot work out what this section is attempting to
> specify, and it desperately needs clarification. When it is
> clarified,
> it may well need further work, but the first step is to clarify its
> intent. If this turns out to be infeasible, then the concept of
> effective type should be scrapped until and unless someone can come up
> with an adequate description.
>
>
> What Size is an Array?
> ----------------------
>
> In this section, the question being asked is how large the
> array object
> that corresponds to a pointer value is. Please ignore any assumptions
> that can be used by the fact that two constructions are in the same
> compilation unit, as they could easily be separated. For the purposes
> of these examples, let us assume no padding and that all
> structures have
> the same alignment.
>
> typedef struct {double z[5];} small;
> typedef struct {double z[7];} large;
>
> double *pp;
> small *qq;
>
> void ONE (double *a) {a[8] = 1;}
>
> void TWO (double a[5]) {ONE((double *)a);}
>
> void THREE (double a[5]) {a[8] = 1;}
>
> void FOUR (double a[5]) {if (a == pp) ONE((double *)a);}
>
> void FIVE (double a[5]) {if (a == pp) ONE((double *)pp);}
>
>
> void SIX (large *a) {a->z[6] = 1;}
>
> void SEVEN (small *a) {SIX((large *)a);}
>
> void EIGHT (large *a) {SEVEN((small *)a);}
>
> void NINE (large *a) {if ((char *)a == (char *)qq) SEVEN((small
> *)a);}
>
> void TEN (large *a) {if ((char *)a == (char *)qq) SEVEN((small
> *)qq);}
>
> void ELEVEN (small *a, int n) {
> int i;
> large *b;
> for (i = 0; i < n; ++i) {b = (large *)a; a = (small *)b;}
> a->z[3] = 1;
> }
>
> int main (void) {
> double p[10];
> small q[2];
>
> ONE(p);
> TWO(p);
> THREE(p);
> pp = p;
> FOUR(p);
> FIVE(p);
>
> SIX((large *)q);
> SEVEN(q);
> EIGHT((large *)q);
> qq = q;
> NINE((large *)q);
> TEN((large *)q);
>
> ELEVEN(q,1);
> ELEVEN(q,2);
>
> return 0;
> }
>
> Now, which of the first five calls are conforming? There are
> obviously
> plausible interpretations of the standard that allow all of
> them, all of
> them except THREE, or only ONE and FIVE. But consider TWO: exactly
> WHICH construction breaks any wording in the standard? There
> is clearly
> nothing in either ONE or TWO on their own that is not conforming, and
> the standard has no explicit concept of a value having a
> 'history' that
> affects its semantics. And is the intent REALLY to
> distinguish FOUR and
> FIVE?
>
> The second set of five calls is mainly to show that the
> effect does not
> depend on array parameter adjustment. Let us assume that the
> introduction of effective type does not exclude SIX; if it does, then
> C99 has broken a huge number of important existing codes!
> The key point
> is that '(large *a)q' points to an array object of length 1; hence, if
> the history matters, so does '(small *)(large *a)q'; which in
> turn means
> that '(large *a)(small *)(large *a)q' is a zero-length array,
> and so the
> access is illegal.
>
> The calls to ELEVEN are to show that this is not solely a
> static problem
> but, in some cases and with some interpretations, depends on the
> execution of the program.
>
> The last time this issue was raised, several people said that the
> standard was clear, but did not agree on exactly which examples were
> conforming. Other people felt they they knew what was intended, but
> doubted that the standard made it entirely clear. And yet others were
> certain that the standard was unclear or even inconsistent.
> My view is
> that there are three possible approaches to consistency:
>
> 1) The original K&R unchecked approach.
>
> All pointer values are simply addresses and, if they are object
> pointers, that object is the largest array of their base type
> that will
> fit within the originally allocated object. I.e. the object as
> originally defined, returned from malloc etc. or returned from some
> specified library call (like getenv). I believe that this is the
> approach being taken by the gcc checked pointer project (on grounds of
> practicality), even though they know that it does not match the
> requirements of the C standard.
>
> If this approach is taken, then the ONLY subsequent constraints on the
> size of an array object pointed to are those imposed by the restrict
> qualifier and the static qualifier in array parameters. I favour this
> interpretation, as it is most consistent with traditional expectations
> and existing codes, though it is the worst approach for SMP
> optimisation. If this were done, there is a case for rephrasing the
> static qualifier in array parameters to mean precisely that number of
> elements (i.e. to adopt traditional Fortran semantics in that case,
> and that case alone).
>
> 2) The approach that only visible information is relevant.
>
> The first thing to do is to specify what 'visible' means in this
> context, and that is not an easy thing to do. One obvious rule is to
> say that only the type actually used in an array subscription is
> relevant (i.e. the lvalue type). As pointed out before, this still
> means that the declared size of array parameters (without the static
> qualifier) is ignored, because their type has already been
> adjusted to a
> pointer type.
>
> If the declared size of array parameters should be be
> significant, then
> the wording of array parameter adjustment will need considerable
> rethinking, because it will introduce yet another category of types
> associated with an lvalue. This is not pretty, and still allows:
>
> void THREE (double a[5]) {double *b = &a[5]; b[3] = 1;}
>
> To lock that out needs a more complex definition of visible, and I
> doubt that it will be easy to find a consistent one, short of the next
> approach.
>
> 3) The approach that pointer values have a history.
>
> This is essentially approach (2) applied dynamically as well as
> statically and taken to the limit. Every time that a pointer
> is cast to
> a new type, implicitly or explicitly, the object that it refers to is
> reduced in size to a whole number of elements. And every time that it
> is passed as a complete array parameter, it is reduced to the size of
> the declared array (before adjustment). Thus the size
> associated with a
> pointer value can decrease but never increase.
>
> However, this still leaves the question of whether passing a pointer
> through a library function like memcpy clears its history, in the same
> way that memchr can be used to clear the const qualifier from
> a pointer.
> And, of course, it does not say anything about the properties of
> pointers created from integers.
>
> This is clearly the best approach for SMP optimisation, but
> would break
> a number of important existing codes in diabolically obscure ways,
> though it is unclear how many of those are already not conforming.
> Codes which I am pretty certain would fall foul of this
> include most BSD
> string handling and much of the X11 system.
>
More information about the cpp-threads
mailing list