lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Sat, 11 Mar 2023 23:43:23 -0800
From:   Linus Torvalds <torvalds@...ux-foundation.org>
To:     Tetsuo Handa <penguin-kernel@...ove.sakura.ne.jp>
Cc:     syzbot <syzbot+96cae094d90877641f32@...kaller.appspotmail.com>,
        linux-kernel@...r.kernel.org, syzkaller-bugs@...glegroups.com
Subject: Re: [syzbot] [kernel?] WARNING in c_start (2)

On Sat, Mar 11, 2023 at 6:57 PM Tetsuo Handa
<penguin-kernel@...ove.sakura.ne.jp> wrote:
>
> syzbot is unable to test kernels due to hitting WARN_ON_ONCE(cpu >= bits) upon
> "cat /proc/cpuinfo" request.
>
> Since commit 596ff4a09b898179 ("cpumask: re-introduce constant-sized cpumask optimizations")
> changed to pass "small_cpumask_bits" instead of "nr_cpumask_bits" to find_next_bit(),
> find_next_bit() returning small_cpumask_bits causes c_next() to go beyond nr_cpumask_bits.
> I think that we need to make sure that cpumask_next() and friends would not return cpu id
> beyond nr_cpumask_bits.

Ahh. yes.

It's the same old "cpumask scanning should be testing >= nr_cpu_ids"
thing, but c_start() does

        *pos = cpumask_next(*pos - 1, cpu_online_mask);

and basically assumes that it is "== nr_cpu_ids" for the end
condition, and uses the value next time around.

And if it is *exactly* nr_cpu_ids, then the next time it gets called,
the "*pos - 1" means that it's all ok.

But if it's > nr_cpu_ids, then next time the "-1" doesn't do anything
useful and the input is still larger than the number of CPU ids.

The core *works* correctly, but it triggers that warning because it is
not doing that test properly.

That c_start() function is ugly, but the simplest patch is probably
this one-liner (whitespace-damaged but hopefully really obvious):

  --- a/arch/x86/kernel/cpu/proc.c
  +++ b/arch/x86/kernel/cpu/proc.c
  @@ -156,6 +156,7 @@
        *pos = cpumask_next(*pos - 1, cpu_online_mask);
        if ((*pos) < nr_cpu_ids)
                return &cpu_data(*pos);
  +     *pos = nr_cpu_ids;
        return NULL;
   }


which just caps that ">= nr_cpu_ids" case down to nr_cpu_ids.

Does that fix your test-case for you?

I'm not entirely convinced we shouldn't clean stuff up with a slightly
bigger patch, though. Instead of capping the 'pos', just testing it
seems the kind of more obvious thing. This code had similar problems
before. So an alternative patch (still whitespace-damaged) would be
something like

  --- a/arch/x86/kernel/cpu/proc.c
  +++ b/arch/x86/kernel/cpu/proc.c
  @@ -153,8 +153,12 @@

   static void *c_start(struct seq_file *m, loff_t *pos)
   {
  -     *pos = cpumask_next(*pos - 1, cpu_online_mask);
  -     if ((*pos) < nr_cpu_ids)
  +     loff_t prev = *pos;
  +
  +     if (prev >= nr_cpu_ids)
  +             return NULL;
  +     *pos = cpumask_next(prev - 1, cpu_online_mask);
  +     if (*pos < nr_cpu_ids)
                return &cpu_data(*pos);
        return NULL;
   }

which is a few lines more of patch, but stops depending on that "pos
has to end up exactly at nr_cpu_ids" thing.

Either patch should result in the same thing and hopefully fix your
warning, so I guess it's just a matter of taste.

                Linus

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ