[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <552B1CDB.9040803@openwall.com>
Date: Mon, 13 Apr 2015 04:33:15 +0300
From: Alexander Cherepanov <ch3root@...nwall.com>
To: discussions@...sword-hashing.net
Subject: Re: [PHC] On type aliasing and similar issues
On 2015-04-10 17:19, Solar Designer wrote:
> On Fri, Apr 10, 2015 at 03:59:05PM +0300, Alexander Cherepanov wrote:
>> The direct use of member names is relatively clear -- it's alllowed and
>> it's plainly spelled out in a footnote in 6.5.2.3p3 (C99 and C11). The
>> use through pointers is also relatively clear -- it's prohibited, which
>> is plainly spelled out in gcc doc[1].
> [...]
>> Everything becomes more complicated when a member of a union is an
>> array. It's somewhat in-between these two cases and I'm not sure how
>> it's supposed to be treated.
>
> What about uses like this:
This question turned out to be surprisingly difficult. After a lot of
reading it seems I got some understanding what's going on.
1. I've got some interpretation of relevant parts of the C standard (as
written) which is quite simple and hopefully non-self-contradicting.
2. This interpretation is not what the Committee intended. The problem
is that the Committee didn't write what it wants and probably don't yet
know what it wants exactly. The sore state of affairs is perfectly
demonstrated by the Defect Report #236 [1] submitted 2000-10-18. The
first example in this DR is allowed by the standard (opinion of the
reporter and my opinion too) but DR is closed saying that "Both programs
invoke undefined behavior" without much further explanations. I would be
scratching my head about what it means for a long time but there are
many discussions of this DR and the last one[2] states (2010-10-08): "In
2005-04 (Lillehammer), the committee gave up waiting for the words to
materialize, instead deciding simply to state the committee's intention
in the DR response, without worrying about whether that intention was
accurately described by the standard." It seems there is not much
progress in this area during last 15 years, including with the release
of C11 standard.
[1] http://open-std.org/jtc1/sc22/wg14/www/docs/dr_236.htm
[2] http://open-std.org/jtc1/sc22/wg14/www/docs/n1520.htm
3. GCC has its own rules which are more strict than the C standard. That
is some strictly conforming programs are miscompiled. There is a
thread[3] which discusses the question very similar to your one quoted
below. Good explanation is in [4], it ends with this: "the original
poster is correct that GCC doesn't implement C99 aliasing as written in
the standard regarding unions. We don't do so because we determined
that this can't possibly have been the intent of the standard as it
makes type-based aliasing relatively useless."
[3] https://gcc.gnu.org/ml/gcc/2010-01/threads.html#00013
[4] https://gcc.gnu.org/ml/gcc/2010-01/msg00263.html
4. AFAIK GCC rules are not documented except for [5]. But I think I've
got some idea about what they want. There is some hope that it's not
self-contradicting:-)
[5] https://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html#Type-punning
BTW regarding your idea of visibility of unions inside a function. It
was proposed[6] and kinda rejected[7] in discussion in gcc mailing list.
[6] http://open-std.org/jtc1/sc22/wg14/www/docs/n1090.htm
[7] https://gcc.gnu.org/ml/gcc/2004-12/msg00164.html
> typedef union {
> struct { uint32_t a1, a2; } a;
> uint64_t b;
> } any_t;
>
> void copy2(uint32_t *dst, uint32_t *src)
> {
> ((any_t *)dst)->b = ((any_t *)src)->b;
> }
>
> There's no access to members of the union through a pointer (nor even
> through an array), but there's expected to be access through uint32_t *
> pointers in the caller of copy2(). Would a compiler inlining copy2() be
> guaranteed to do what the programmer expected (copy two 32-bit values,
> potentially faster and assuming 64-bit alignment)?
According to (my understanding of) the C standard: it's ok when dst and
src happen to be aligned as required for uint64_t, undefined behavior at
pointer conversion otherwise. gcc 4.9.1 on my x86_64 GNU/Linux shows
_Alignof(uint32_t) == 4 and _Alignof(uint64_t) == 8. IOW: not ok.
GCC: never ok because there is no object of type any_t where dst or src
point to.
> Or with the opposite uses of the two integer types:
>
> void add32x2(uint64_t *dst, uint64_t *src)
> {
> ((any_t *)dst)->a.a1 += ((any_t *)src)->a.a1;
> ((any_t *)dst)->a.a2 += ((any_t *)src)->a.a2;
> }
>
> where the caller is expected to access through uint64_t * pointers.
C standard: ok (assuming _Alignof(any_t) == _Alignof(uint64_t) >=
_Alignof(uint32_t)).
GCC: never ok because there is no object of type any_t where dst or src
point to.
> (Of course, this example is sensitive to byte order - or rather, to the
> order of 32-bit halves in a 64-bit word.)
The order of 32-bit halves in a 64-bit word is probably not important in
your example. The fact that halves from the POV of logical bits are the
same as halves from the POV of storage is. AFAIU location of specific
bits of uint64_t inside 8 bytes is not specified.
>> Side note: not much have changed between C89 and C99 in this question.
>> Accessing a wrong member in a union is an implementation-defined
>> behavior in C89 but a footnote in 3.3.2.3 implies that the reason for
>> this is indeterminate byte order. OTOH this behavior is defined in C99
>> but the byte order is still not specified. Hence a strictly conforming
>> program shouldn't use it anyway.
>
> There are many use cases where byte order does not matter, such as when
> implementing a maybe-faster memset() or memcpy() alike that would use a
> wider data type (as long as alignment and total size permit).
GCC has[1] "may_alias" type attribute for such cases.
[1]
https://gcc.gnu.org/onlinedocs/gcc/Type-Attributes.html#index-g_t_0040code_007bmay_005falias_007d-type-attribute-3372
--
Alexander Cherepanov
Powered by blists - more mailing lists