[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1CAE4AF4-D557-4A18-891D-BAC1B2156B66@kernel.org>
Date: Wed, 15 Feb 2023 17:51:39 -0800
From: Kees Cook <kees@...nel.org>
To: Yury Norov <yury.norov@...il.com>,
Bruno Goncalves <bgoncalv@...hat.com>
CC: Networking <netdev@...r.kernel.org>, alan.maguire@...cle.com,
Jakub Kicinski <kuba@...nel.org>,
LKML <linux-kernel@...r.kernel.org>,
CKI Project <cki-project@...hat.com>,
Kees Cook <keescook@...omium.org>,
Miguel Ojeda <ojeda@...nel.org>,
Nick Desaulniers <ndesaulniers@...gle.com>
Subject: Re: [6.2.0-rc7] BUG: KASAN: slab-out-of-bounds in hop_cmp+0x26/0x110
On February 15, 2023 4:31:32 PM PST, Yury Norov <yury.norov@...il.com> wrote:
>+ Kees Cook <keescook@...omium.org>
>+ Miguel Ojeda <ojeda@...nel.org>
>+ Nick Desaulniers <ndesaulniers@...gle.com>
>
>On Wed, Feb 15, 2023 at 09:24:52AM +0100, Bruno Goncalves wrote:
>> On Tue, 14 Feb 2023 at 15:32, Yury Norov <yury.norov@...il.com> wrote:
>> >
>> > On Tue, Feb 14, 2023 at 02:23:06PM +0100, Bruno Goncalves wrote:
>> > > Hello,
>> > >
>> > > recently when testing kernel with debug options set from net-next [1]
>> > > and bpf-next [2] the following call trace happens:
>> > >
>> > Hi Bruno,
>> >
>> > Thanks for report.
>> >
>> > This looks weird, because the hop_cmp() spent for 3 month in -next till
>> > now. Anyways, can you please share your NUMA configuration so I'll try
>> > to reproduce the bug locally? What 'numactl -H' outputs?
>> >
>>
>> Here is the output:
>>
>> numactl -H
>> available: 4 nodes (0-3)
>> node 0 cpus: 0 1 2 3 4 5 6 7 32 33 34 35 36 37 38 39
>> node 0 size: 32063 MB
>> node 0 free: 31610 MB
>> node 1 cpus: 8 9 10 11 12 13 14 15 40 41 42 43 44 45 46 47
>> node 1 size: 32248 MB
>> node 1 free: 31909 MB
>> node 2 cpus: 16 17 18 19 20 21 22 23 48 49 50 51 52 53 54 55
>> node 2 size: 32248 MB
>> node 2 free: 31551 MB
>> node 3 cpus: 24 25 26 27 28 29 30 31 56 57 58 59 60 61 62 63
>> node 3 size: 32239 MB
>> node 3 free: 31468 MB
>> node distances:
>> node 0 1 2 3
>> 0: 10 21 31 21
>> 1: 21 10 21 31
>> 2: 31 21 10 21
>> 3: 21 31 21 10
>>
>> Bruno
>
>So, I was able to reproduce it, and it seems like a compiler issue.
>
>The problem is that hop_cmp() calculates pointer to a previous hop
>object unconditionally at the beginning of the function:
>
> struct cpumask **prev_hop = *((struct cpumask ***)b - 1);
>
>Obviously, for the first hop, there's no such thing like a previous
>one, and later in the code 'prev_hop' is used conditionally on that:
>
> k->w = (b == k->masks) ? 0 : cpumask_weight_and(k->cpus, prev_hop[k->node]);
>
>To me the code above looks like it instructs the compiler to dereference
>'b - 1' only if b != k->masks, i.e. when b is not the first hop. But GCC
>does that unconditionally, which looks wrong.
>
>If I defer dereferencing manually like in the snippet below, the kasan
>warning goes away.
>
>diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c
>index 48838a05c008..5f297f81c574 100644
>--- a/kernel/sched/topology.c
>+++ b/kernel/sched/topology.c
>@@ -2081,14 +2081,14 @@ struct __cmp_key {
>
> static int hop_cmp(const void *a, const void *b)
> {
>- struct cpumask **prev_hop = *((struct cpumask ***)b - 1);
> struct cpumask **cur_hop = *(struct cpumask ***)b;
> struct __cmp_key *k = (struct __cmp_key *)a;
>
> if (cpumask_weight_and(k->cpus, cur_hop[k->node]) <= k->cpu)
> return 1;
>
>- k->w = (b == k->masks) ? 0 : cpumask_weight_and(k->cpus, prev_hop[k->node]);
>+ k->w = (b == k->masks) ? 0 :
>+ cpumask_weight_and(k->cpus, (*((struct cpumask ***)b - 1))[k->node]);
> if (k->w <= k->cpu)
> return 0;
>
>I don't understand why GCC doesn't optimize out unneeded dereferencing.
>It does that even if I replace ternary operator with if-else construction.
>To me it looks like a compiler bug.
>
>However, I acknowledge that I'm not a great expert in C standard, so
>it's quite possible that there may be some rule that prevents from
>doing such optimizations, even for non-volatile variables.
>
>Adding compiler people. Guys, could you please clarify on that?
>If it's my fault, I'll submit fix shortly.
My understanding is that without getting inlined, the compiler cannot evaluate "b == k->masks" at compile time (if it can at all). So since the dereference is part of variable initialization, it's not executed later: it's executed at function entry.
Regardless, this whole function looks kind of hard to read. Why not fully expand it with the if/else logic and put any needed variables into the respective clauses? Then humans can read it and the compiler will optimize it down just as efficiently.
-Kees
--
Kees Cook
Powered by blists - more mailing lists