[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20150407124925.GA16363@bolet.org>
Date: Tue, 7 Apr 2015 14:49:25 +0200
From: Thomas Pornin <pornin@...et.org>
To: discussions@...sword-hashing.net
Subject: Re: [PHC] On type aliasing and similar issues
On Tue, Mar 31, 2015 at 04:30:34PM +0300, Solar Designer wrote:
> I have not yet seen it happen across object file boundary, but since
> link-time optimization is a thing I expect that it may (if not now,
> then later).
I have witnessed cross-file optimizations with strict aliasing rules 15
years ago, with a C compiler that came with HP/UX (on a PA-RISC system).
That compiler was very aggressive in its optimizations; notably, it was
good at detecting "useless" code and remove it. In particular, when
benchmarking a crypto function, the compiler/linker could notice that
that whole crypto algorithm was a "pure function" (no side effect) and
that the calling code (the benchmarking loop) was not actually using the
output, and then remove it altogether. For a shining but very brief
moment I thought I hade produced a very optimized 3DES implementation,
and then I realized that there was no way that my code, however
streamlined, could compute a complete 3DES in 10 clock cycles...
As a side-note, that kind of issue can happen quite naturally with
JIT-based languages (CLR/.NET, Java, Python...), making benchmarking
an increasingly thorny issue.
As another side-side-note, take care of alignment too. The x86
architecture is very forgiving for unaligned accesses (on most modern
x86 CPU, an unaligned access triggers a mere 1-cycle penalty when
crossing a L1 cache line boundary), except for some specific
SSE2-related opcodes. PowerPC in big-endian mode also tolerates
unaligend accesses. ARM, Sparc, Mips and a few other architectures do
not (in some cases, the OS traps the unaligned exception and emulates
the access transparently, but with a massive penalty).
When code decodes 32-bit and 64-bit words with generic byte-by-byte
functions, such problems do not occur. However, code that does things
which break aliasing rules also tends to break alignment rules. Test
code usually allocates inputs on a stack buffer or through a malloc()
call, so the input password for a password-hashing function will tend to
be in an already aligned buffer, while the same is not necessarily true
of production code. For instance, consider HTTP basic authentication
(RFC 2617): the user password is transmitted as the Base64-encoding of
'userid:password'. On the server side, Base64 decoding will be applied,
leading to a character string where the 'password' component appears at
some point, not at the start of the buffer, and therefore with an
alignment that depends on the number of characters in the user ID.
> A solution here is to use a union type containing all of the integer
> and SIMD types. The called function may cast the pointer to a pointer
> to such union type, then access the fields of its desired type. The
> compiler will then know that they might alias those other listed
> types.
In fact, even with a union, the aliasing rules (as per the C standard)
forbid such trans-type access. If you have this:
union {
uint32_t x[2];
uint64_t y;
} *v;
(...)
v->x[0] = 17;
v->y = 42;
printf("%u\n", (unsigned)v->x[0]);
then the compiler may still "legally" print out '17', i.e. considering
that the access to 'y' did not impact the values that can be accessed
through 'x', because of the distinct types (there again, there are
special rules for 'char' and 'unsigned char'). While the union guarantees
identity of storage ((void *)&v->x == (void *)&v->y), it still does not
allow seemless transtyping.
Most (if not all) C compilers will still be wary of unions and interpret
them the way you indicate, because doing otherwise would break a lot of
code. However, this mostly complicates the issue: not only are aliasing
rules relatively complex and arcane, but the _interpretation_ of these
rules by the compiler writer may differ in some subtle points with what
you may have come up with. In any case, it would be inordinately
optimistic to believe that existing C compilers get it right all the
time.
Bottom-line: use byte-by-byte decoding and encoding functions. Don't
assume anything about architecture native endianness and alignment
requirements. This will make your code a lot more robust, and, probably,
quite clearer for human readers as well.
--Thomas Pornin
Powered by blists - more mailing lists