lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <552B1CDB.9040803@openwall.com>
Date: Mon, 13 Apr 2015 04:33:15 +0300
From: Alexander Cherepanov <ch3root@...nwall.com>
To: discussions@...sword-hashing.net
Subject: Re: [PHC] On type aliasing and similar issues

On 2015-04-10 17:19, Solar Designer wrote:
> On Fri, Apr 10, 2015 at 03:59:05PM +0300, Alexander Cherepanov wrote:
>> The direct use of member names is relatively clear -- it's alllowed and
>> it's plainly spelled out in a footnote in 6.5.2.3p3 (C99 and C11). The
>> use through pointers is also relatively clear -- it's prohibited, which
>> is plainly spelled out in gcc doc[1].
> [...]
>> Everything becomes more complicated when a member of a union is an
>> array. It's somewhat in-between these two cases and I'm not sure how
>> it's supposed to be treated.
>
> What about uses like this:

This question turned out to be surprisingly difficult. After a lot of 
reading it seems I got some understanding what's going on.

1. I've got some interpretation of relevant parts of the C standard (as 
written) which is quite simple and hopefully non-self-contradicting.

2. This interpretation is not what the Committee intended. The problem 
is that the Committee didn't write what it wants and probably don't yet 
know what it wants exactly. The sore state of affairs is perfectly 
demonstrated by the Defect Report #236 [1] submitted 2000-10-18. The 
first example in this DR is allowed by the standard (opinion of the 
reporter and my opinion too) but DR is closed saying that "Both programs 
invoke undefined behavior" without much further explanations. I would be 
scratching my head about what it means for a long time but there are 
many discussions of this DR and the last one[2] states (2010-10-08): "In 
2005-04 (Lillehammer), the committee gave up waiting for the words to 
materialize, instead deciding simply to state the committee's intention 
in the DR response, without worrying about whether that intention was 
accurately described by the standard." It seems there is not much 
progress in this area during last 15 years, including with the release 
of C11 standard.

[1] http://open-std.org/jtc1/sc22/wg14/www/docs/dr_236.htm
[2] http://open-std.org/jtc1/sc22/wg14/www/docs/n1520.htm

3. GCC has its own rules which are more strict than the C standard. That 
is some strictly conforming programs are miscompiled. There is a 
thread[3] which discusses the question very similar to your one quoted 
below. Good explanation is in [4], it ends with this: "the original 
poster is correct that GCC doesn't implement C99 aliasing as written in 
the standard regarding unions.  We don't do so because we determined 
that this can't possibly have been the intent of the standard as it 
makes type-based aliasing relatively useless."

[3] https://gcc.gnu.org/ml/gcc/2010-01/threads.html#00013
[4] https://gcc.gnu.org/ml/gcc/2010-01/msg00263.html

4. AFAIK GCC rules are not documented except for [5]. But I think I've 
got some idea about what they want. There is some hope that it's not 
self-contradicting:-)

[5] https://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html#Type-punning

BTW regarding your idea of visibility of unions inside a function. It 
was proposed[6] and kinda rejected[7] in discussion in gcc mailing list.

[6] http://open-std.org/jtc1/sc22/wg14/www/docs/n1090.htm
[7] https://gcc.gnu.org/ml/gcc/2004-12/msg00164.html

> typedef union {
> 	struct { uint32_t a1, a2; } a;
> 	uint64_t b;
> } any_t;
>
> void copy2(uint32_t *dst, uint32_t *src)
> {
> 	((any_t *)dst)->b = ((any_t *)src)->b;
> }
>
> There's no access to members of the union through a pointer (nor even
> through an array), but there's expected to be access through uint32_t *
> pointers in the caller of copy2().  Would a compiler inlining copy2() be
> guaranteed to do what the programmer expected (copy two 32-bit values,
> potentially faster and assuming 64-bit alignment)?

According to (my understanding of) the C standard: it's ok when dst and 
src happen to be aligned as required for uint64_t, undefined behavior at 
pointer conversion otherwise. gcc 4.9.1 on my x86_64 GNU/Linux shows 
_Alignof(uint32_t) == 4 and _Alignof(uint64_t) == 8. IOW: not ok.

GCC: never ok because there is no object of type any_t where dst or src 
point to.

> Or with the opposite uses of the two integer types:
>
> void add32x2(uint64_t *dst, uint64_t *src)
> {
> 	((any_t *)dst)->a.a1 += ((any_t *)src)->a.a1;
> 	((any_t *)dst)->a.a2 += ((any_t *)src)->a.a2;
> }
>
> where the caller is expected to access through uint64_t * pointers.

C standard: ok (assuming _Alignof(any_t) == _Alignof(uint64_t) >= 
_Alignof(uint32_t)).

GCC: never ok because there is no object of type any_t where dst or src 
point to.

> (Of course, this example is sensitive to byte order - or rather, to the
> order of 32-bit halves in a 64-bit word.)

The order of 32-bit halves in a 64-bit word is probably not important in 
your example. The fact that halves from the POV of logical bits are the 
same as halves from the POV of storage is. AFAIU location of specific 
bits of uint64_t inside 8 bytes is not specified.

>> Side note: not much have changed between C89 and C99 in this question.
>> Accessing a wrong member in a union is an implementation-defined
>> behavior in C89 but a footnote in 3.3.2.3 implies that the reason for
>> this is indeterminate byte order. OTOH this behavior is defined in C99
>> but the byte order is still not specified. Hence a strictly conforming
>> program shouldn't use it anyway.
>
> There are many use cases where byte order does not matter, such as when
> implementing a maybe-faster memset() or memcpy() alike that would use a
> wider data type (as long as alignment and total size permit).

GCC has[1] "may_alias" type attribute for such cases.

[1] 
https://gcc.gnu.org/onlinedocs/gcc/Type-Attributes.html#index-g_t_0040code_007bmay_005falias_007d-type-attribute-3372

-- 
Alexander Cherepanov

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ