[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <5527C919.1080601@openwall.com>
Date: Fri, 10 Apr 2015 15:59:05 +0300
From: Alexander Cherepanov <ch3root@...nwall.com>
To: discussions@...sword-hashing.net
Subject: Re: [PHC] On type aliasing and similar issues
On 2015-04-08 17:05, Solar Designer wrote:
> On Wed, Apr 08, 2015 at 07:25:29AM +0300, Alexander Cherepanov wrote:
>> On 2015-04-08 06:03, Samuel Neves wrote:
>>> On 04/08/2015 03:37 AM, Alexander Cherepanov wrote:
>>>> AFACT this is implementation-defined in C89 (3.3.2.3) and fully defined
>>>> in C99 and C11 (6.5.2.3p3).
>>>
>>> Yes, type punning with unions is now OK (though implementation-defined;
>>> accessing the wrong member may still trap) in
>>
>> uint32_t[2] and uint64_t don't have padding bits and have the same size
>> so this particular example is fully defined.
>
> FWIW, icc 14.0.0 miscompiles the code I have in php_mt_seed 3.2 if I
> remove the "volatile" workaround:
>
> #ifdef __ICC
> volatile
> #endif
> union {
> vtype v;
> uint32_t s[sizeof(vtype) / 4];
> } u[8], uM[8];
>
> where vtype is e.g.:
>
> typedef __m128i vtype;
>
> and the union members are only accessed directly, not via pointers,
> although indeed there are uses like u[i].s[j] (so with non-constant
> index for the .s[] array).
That's a moot point indeed and I didn't fully appreciate the fact that
one of the members in the unions in the previous examples is an array. I
was thinking about cases more like this:
union {
struct { uint32_t x0, x1; } x;
uint64_t y;
} v;
The direct use of member names is relatively clear -- it's alllowed and
it's plainly spelled out in a footnote in 6.5.2.3p3 (C99 and C11). The
use through pointers is also relatively clear -- it's prohibited, which
is plainly spelled out in gcc doc[1]. It's not entirely clear if it
follows from the C standard but gcc approach seems to be permitted by
the standard. And, e.g., "Aliasing restrictions of C11 formalized in
Coq"[2] follows the gcc doc:
"In order to enable type-based alias analysis, we have to ensure that
only under certain conditions a union can be read using a pointer to
another variant than the current one (this is called type-punning [6,
6.5.2.3]). Since the C11 standard is unclear about these conditions2 ,
we follow the GCC documentation [4] on it. It states that “type-punning
is allowed, provided the memory is accessed through the union type”."
[1] https://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html#Type-punning
[2] http://robbertkrebbers.nl/research/articles/aliasing.pdf
Everything becomes more complicated when a member of a union is an
array. It's somewhat in-between these two cases and I'm not sure how
it's supposed to be treated.
> I thought this was an icc bug, but maybe I'm wrong?
First of all, which version of C does icc target? In C89, accessing a
union through a wrong member is implementation-defined behavior. Does
icc document it? gcc does it here:
https://gcc.gnu.org/onlinedocs/gcc/Structures-unions-enumerations-and-bit-fields-implementation.html
Side note: not much have changed between C89 and C99 in this question.
Accessing a wrong member in a union is an implementation-defined
behavior in C89 but a footnote in 3.3.2.3 implies that the reason for
this is indeterminate byte order. OTOH this behavior is defined in C99
but the byte order is still not specified. Hence a strictly conforming
program shouldn't use it anyway.
> Full context: http://www.openwall.com/php_mt_seed/
>
>>> C99 and above. What is being dereferenced in the example is the pointer to
>>> the union, not the members, so I'm not sure
>>> strict aliasing's undefined behavior applies. The example could be further
>>> improved to demonstrate this:
>>>
>>> #include <stdint.h>
>>> #include <stdio.h>
>>>
>>> union U {
>>> uint32_t x[2];
>>> uint64_t y;
>>> };
>>>
>>> extern union U * v;
>>>
>>> void f() {
>>> uint32_t * p = &v->x[0];
>>> uint64_t * q = &v->y;
>>> *p = 17;
>>> *q = 42;
>>> printf("%u\n", v->x[0]);
>>> }
>>>
>>> While Clang and recent GCC do what one would hope (print 42), GCC 3.4 and
>>> Intel compiler print 17.
>>
>> Yes, it seems the standard intends to permit access to wrong members
>> only via . and -> operators. You cannot access them in a less direct
>> way.
>
> In my php_mt_seed example, is accessing .s[j] standards compliant or
> not? If not, that's really unfortunate.
I don't know, sorry.
> What about e.g., .s[3] (constant index)? We could want to narrow down
> icc's behavior - whether the problem occurs only with variable or also
> with constant indices, or maybe even without an array at all. I haven't
> tried yet.
If you can eliminate the use of non-constant indexes then you can
replace everything with non-array variables as well. Perhaps this is the
easiest way in your case.
Another possible way is to pipe values through an intermediate struct:
struct s {
uint32_t s[sizeof(vtype) / 4];
} u[8], uM[8];
union u {
vtype v;
struct s s;
} tmp;
tmp.v = a; u[0] = tmp.s;
...
Or without an explicit variable:
u[0] = ((union u){a}).s;
>> If you pass them to another function then recent gcc -O2 will print
>> 17 too.
>
> This is not surprising. However, I think behavior within one function,
> where having derived a pointer from a union member is clearly visible to
> the compiler, could be defined.
The gcc doc ([1] above) specifically warns against it:
"
union a_union {
int i;
double d;
};
[skip]
However, this code might not [work as expected]:
int f() {
union a_union t;
int* ip;
t.d = 3.0;
ip = &t.i;
return *ip;
}
"
> Does any recent C standard say anything about the special case of
> accessing union members via pointers within the same function?
I don't think so.
--
Alexander Cherepanov
Powered by blists - more mailing lists