[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAAUqJDsHzgGZTpYCAxAy=b3P0bq7dGrw6St68SkwjqwHpusN+A@mail.gmail.com>
Date: Mon, 9 Jul 2018 11:47:42 +0200
From: Ondrej Mosnáček
<omosnacek+linux-crypto@...il.com>
To: torvalds@...ux-foundation.org
Cc: Herbert Xu <herbert@...dor.apana.org.au>,
"David S. Miller" <davem@...emloft.net>,
Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
linux-crypto@...r.kernel.org,
Milan Brož <gmazyland@...il.com>
Subject: Re: Crypto Fixes for 4.18
Hi Linus,
ne 8. 7. 2018 o 20:32 Linus Torvalds <torvalds@...ux-foundation.org> napísal(a):
>
> On Sun, Jul 8, 2018 at 9:20 AM Herbert Xu <herbert@...dor.apana.org.au> wrote:
> >
> > - Add missing RETs in x86 aegis/morus.
>
> Side note - I queried earlier during the discussion about this: how
> was this code taken despite having clearly never tested on _anything_?
>
> That's a serious question. Code that simply has never had any testing
> AT ALL should not have gotten in.
I did test the code using the included test vectors (and I found and
resolved lots of issues before submitting the patches thanks to that).
A good deal of the test vectors actually do trigger the code path that
calls the buggy function, so somehow it must have been working despite
the bug (see below).
> The use of 'int3' in padding showed the issue, but I don't believe the
> code could possibly have worked with the nops and fallthroughs.
I just looked at the disassembly of the function and its surroundings
(as compiled by my testing environment) and it seems that by a curious
but logical coincidence, the code actually *did* work and without any
side effects (other than executing a few useless instructions before
returning).
This is what the C signatures of the relevant functions look like (for
aegis128, the other cases are analogical):
asmlinkage void crypto_aegis128_aesni_enc_tail(
void *state, unsigned int length, const void *src, void *dst);
asmlinkage void crypto_aegis128_aesni_dec(
void *state, unsigned int length, const void *src, void *dst);
Notice that these two functions have identical signatures, this will
be important later. Now, the disassembly for
crypto_aegis128_aesni_enc_tail looks roughly like this:
0000000000000950 <crypto_aegis128_aesni_enc_tail>:
[some code...]
9c3: 0f 1f 00 nopl (%rax)
9c6: 66 2e 0f 1f 84 00 00 nopw %cs:0x0(%rax,%rax,1)
9cd: 00 00 00
00000000000009d0 <crypto_aegis128_aesni_dec>:
9d0: 48 83 fe 10 cmp $0x10,%rsi
9d4: 0f 82 c3 03 00 00 jb d9d <crypto_aegis128_aesni_dec+0x3cd>
[some code...]
d9d: c3 retq # <---
<crypto_aegis128_aesni_dec+0x3cd> is here
d9e: 66 90 xchg %ax,%ax
So... thanks to the NOP padding, the control after the end of the
_enc_tail function walks right into the _dec function. This looks
scary at first glance, but here we are "saved" by the combination of
the following:
1. The second argument of the _enc_tail function (length; passed via
%rsi) is implictly always less than the block size (16 or 32 bytes).
2. The second argument of the _dec function (length; also passed via
%rsi) is checked to be greater than or equal to the block size (16 or
32 bytes); if it is less, then the function does nothing and just
returns.
3. _enc_tail does not modify the value in %rsi.
In conclusion, the bug remained undiscovered not because of lack of
testing, but because by sheer luck it was "working" anyway...
Sorry for introducing this (and other) bugs that had to be fixed
post-merging (I am the one who wrote the code). It is a lot of new
code that is hard to review, as it contains a lot of repetitive
boilerplate and assembly code.
Cheers,
Ondrej
Powered by blists - more mailing lists