linux-kernel - Re: [syzbot] [kernel?] WARNING in c

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAHk-=wgMMYE+vs-ZhxnAC_7NMw601xMOZOBqL1NR4kQYDDcK_A@mail.gmail.com>
Date:   Sat, 11 Mar 2023 23:43:23 -0800
From:   Linus Torvalds <torvalds@...ux-foundation.org>
To:     Tetsuo Handa <penguin-kernel@...ove.sakura.ne.jp>
Cc:     syzbot <syzbot+96cae094d90877641f32@...kaller.appspotmail.com>,
        linux-kernel@...r.kernel.org, syzkaller-bugs@...glegroups.com
Subject: Re: [syzbot] [kernel?] WARNING in c_start (2)

On Sat, Mar 11, 2023 at 6:57 PM Tetsuo Handa
<penguin-kernel@...ove.sakura.ne.jp> wrote:
>
> syzbot is unable to test kernels due to hitting WARN_ON_ONCE(cpu >= bits) upon
> "cat /proc/cpuinfo" request.
>
> Since commit 596ff4a09b898179 ("cpumask: re-introduce constant-sized cpumask optimizations")
> changed to pass "small_cpumask_bits" instead of "nr_cpumask_bits" to find_next_bit(),
> find_next_bit() returning small_cpumask_bits causes c_next() to go beyond nr_cpumask_bits.
> I think that we need to make sure that cpumask_next() and friends would not return cpu id
> beyond nr_cpumask_bits.

Ahh. yes.

It's the same old "cpumask scanning should be testing >= nr_cpu_ids"
thing, but c_start() does

        *pos = cpumask_next(*pos - 1, cpu_online_mask);

and basically assumes that it is "== nr_cpu_ids" for the end
condition, and uses the value next time around.

And if it is *exactly* nr_cpu_ids, then the next time it gets called,
the "*pos - 1" means that it's all ok.

But if it's > nr_cpu_ids, then next time the "-1" doesn't do anything
useful and the input is still larger than the number of CPU ids.

The core *works* correctly, but it triggers that warning because it is
not doing that test properly.

That c_start() function is ugly, but the simplest patch is probably
this one-liner (whitespace-damaged but hopefully really obvious):

  --- a/arch/x86/kernel/cpu/proc.c
  +++ b/arch/x86/kernel/cpu/proc.c
  @@ -156,6 +156,7 @@
        *pos = cpumask_next(*pos - 1, cpu_online_mask);
        if ((*pos) < nr_cpu_ids)
                return &cpu_data(*pos);
  +     *pos = nr_cpu_ids;
        return NULL;
   }

which just caps that ">= nr_cpu_ids" case down to nr_cpu_ids.

Does that fix your test-case for you?

I'm not entirely convinced we shouldn't clean stuff up with a slightly
bigger patch, though. Instead of capping the 'pos', just testing it
seems the kind of more obvious thing. This code had similar problems
before. So an alternative patch (still whitespace-damaged) would be
something like

  --- a/arch/x86/kernel/cpu/proc.c
  +++ b/arch/x86/kernel/cpu/proc.c
  @@ -153,8 +153,12 @@

   static void *c_start(struct seq_file *m, loff_t *pos)
   {
  -     *pos = cpumask_next(*pos - 1, cpu_online_mask);
  -     if ((*pos) < nr_cpu_ids)
  +     loff_t prev = *pos;
  +
  +     if (prev >= nr_cpu_ids)
  +             return NULL;
  +     *pos = cpumask_next(prev - 1, cpu_online_mask);
  +     if (*pos < nr_cpu_ids)
                return &cpu_data(*pos);
        return NULL;
   }

which is a few lines more of patch, but stops depending on that "pos
has to end up exactly at nr_cpu_ids" thing.

Either patch should result in the same thing and hopefully fix your
warning, so I guess it's just a matter of taste.

                Linus