linux-kernel - Re: [PATCH v2] crypto: Fix divide error in do_xor

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAD=FV=Uq5TWpObFLhHBp7T4esuT_qaaMuYGaEz7xy1_MD5w_Gw@mail.gmail.com>
Date:   Tue, 5 Jan 2021 13:24:18 -0800
From:   Doug Anderson <dianders@...omium.org>
To:     Kirill Tkhai <ktkhai@...tuozzo.com>
Cc:     Ard Biesheuvel <ardb@...nel.org>, Arnd Bergmann <arnd@...db.de>,
        Herbert Xu <herbert@...dor.apana.org.au>,
        "David S. Miller" <davem@...emloft.net>,
        Linux Crypto Mailing List <linux-crypto@...r.kernel.org>,
        Linux Kernel Mailing List <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH v2] crypto: Fix divide error in do_xor_speed()

Hi,

On Wed, Dec 30, 2020 at 1:34 PM Kirill Tkhai <ktkhai@...tuozzo.com> wrote:
>
> crypto: Fix divide error in do_xor_speed()
>
> From: Kirill Tkhai <ktkhai@...tuozzo.com>
>
> Latest (but not only latest) linux-next panics with divide
> error on my QEMU setup.
>
> The patch at the bottom of this message fixes the problem.
>
> xor: measuring software checksum speed
> divide error: 0000 [#1] PREEMPT SMP KASAN
> PREEMPT SMP KASAN
> CPU: 3 PID: 1 Comm: swapper/0 Not tainted 5.10.0-next-20201223+ #2177
> RIP: 0010:do_xor_speed+0xbb/0xf3
> Code: 41 ff cc 75 b5 bf 01 00 00 00 e8 3d 23 8b fe 65 8b 05 f6 49 83 7d 85 c0 75 05 e8
>  84 70 81 fe b8 00 00 50 c3 31 d2 48 8d 7b 10 <f7> f5 41 89 c4 e8 58 07 a2 fe 44 89 63 10 48 8d 7b 08
>  e8 cb 07 a2
> RSP: 0000:ffff888100137dc8 EFLAGS: 00010246
> RAX: 00000000c3500000 RBX: ffffffff823f0160 RCX: 0000000000000000
> RDX: 0000000000000000 RSI: 0000000000000808 RDI: ffffffff823f0170
> RBP: 0000000000000000 R08: ffffffff8109c50f R09: ffffffff824bb6f7
> R10: fffffbfff04976de R11: 0000000000000001 R12: 0000000000000000
> R13: ffff888101997000 R14: ffff888101994000 R15: ffffffff823f0178
> FS:  0000000000000000(0000) GS:ffff8881f7780000(0000) knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 0000000000000000 CR3: 000000000220e000 CR4: 00000000000006a0
> Call Trace:
>  calibrate_xor_blocks+0x13c/0x1c4
>  ? do_xor_speed+0xf3/0xf3
>  do_one_initcall+0xc1/0x1b7
>  ? start_kernel+0x373/0x373
>  ? unpoison_range+0x3a/0x60
>  kernel_init_freeable+0x1dd/0x238
>  ? rest_init+0xc6/0xc6
>  kernel_init+0x8/0x10a
>  ret_from_fork+0x1f/0x30
> ---[ end trace 5bd3c1d0b77772da ]---
>
> Fixes: c055e3eae0f1 ("crypto: xor - use ktime for template benchmarking")
> Signed-off-by: Kirill Tkhai <ktkhai@...tuozzo.com>
> Acked-by: Ard Biesheuvel <ardb@...nel.org>
> ---
>
> v2: New Year resend :) Added fixes tag.
>  crypto/xor.c |    2 ++
>  1 file changed, 2 insertions(+)
>
> diff --git a/crypto/xor.c b/crypto/xor.c
> index eacbf4f93990..8f899f898ec9 100644
> --- a/crypto/xor.c
> +++ b/crypto/xor.c
> @@ -107,6 +107,8 @@ do_xor_speed(struct xor_block_template *tmpl, void *b1, void *b2)
>         preempt_enable();
>
>         // bytes/ns == GB/s, multiply by 1000 to get MB/s [not MiB/s]
> +       if (!min)
> +               min = 1;

I guess this is if you just have a ktime backend that is not granular
enough for this measurement?  So if ktime is backed by a 32kHz clock
then ktime might increment in ~30us increments and maybe we ran in
less time than that?

...so while I think your fix will avoid the crash and could land as a
stopgap, it's a sign that we need to run more repetitions on your
particular setup to get accurate timings.  Your patch will probably
cause it to just randomly pick one of the implementations.

Presumably the right thing to do would be to look at
ktime_get_resolution_ns().  If "diff" is ever less than
"ktime_get_resolution_ns() * 10" then we should ramp up the number of
repetitions and try again.  The extra "* 10" is to make sure that we'd
be able to tell the difference between faster and slower algorithms.
Perhaps it should actually be more like * 50 or * 100.

-Doug