lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAJDe-O+7mqqrGvWimztYgEqDDqJGH=EA1NgJyiJjnurLASOQ8w@mail.gmail.com>
Date:   Wed, 3 Aug 2022 14:21:46 -0700
From:   Neel Natu <neelnatu@...gle.com>
To:     Yury Norov <yury.norov@...il.com>
Cc:     Andy Shevchenko <andriy.shevchenko@...ux.intel.com>,
        Rasmus Villemoes <linux@...musvillemoes.dk>,
        Peter Zijlstra <peterz@...radead.org>,
        linux-kernel@...r.kernel.org
Subject: Re: [PATCH] sched, cpumask: don't leak impossible cpus via for_each_cpu_wrap().

On Wed, Aug 3, 2022 at 11:52 AM Yury Norov <yury.norov@...il.com> wrote:
>
> On Wed, Aug 03, 2022 at 10:49:57AM -0700, Neel Natu wrote:
> > On Tue, Aug 2, 2022 at 6:22 PM Yury Norov <yury.norov@...il.com> wrote:
> > >
> > > On Tue, Aug 2, 2022 at 2:41 PM Neel Natu <neelnatu@...gle.com> wrote:
> > > >
> > > > The value of 'nr_cpumask_bits' is dependent on CONFIG_CPUMASK_OFFSTACK.
> > > > This in turn can change the set of cpus visited by for_each_cpu_wrap()
> > > > with a mask that has bits set in the range [nr_cpu_ids, NR_CPUS).
> > > >
> > > > Specifically on !CONFIG_CPUMASK_OFFSTACK kernels the API can iterate
> > > > over cpus outside the 'cpu_possible_mask'.
> > > >
> > > > Fix this to make its behavior match for_each_cpu() which always limits
> > > > the iteration to the range [0, nr_cpu_ids).
> > > >
> > > > Signed-off-by: Neel Natu <neelnatu@...gle.com>
> > >
> > > The patch itself doesn't look correct because it randomly switches a piece
> > > of cpumask API from nr_cpumask_bits to nr_cpu_ids, and doesn't touch
> > > others.
> > >
> > > However...
> > >
> > > I don't know the story behind having 2 variables holding the max possible
> > > number of cpus, and it looks like it dates back to prehistoric times.  In
> > > modern kernel, there are 2 cases where nr_cpumask_bits == nr_cpu_ids
> > > for sure: it's CONFIG_CPUMASK_OFFSTACK=y and
> > > CONFIG_HOTPLUG_CPU=y. At least one of those is enabled in defconfig
> > > of every popular architecture.
> > >
> >
> > Hmm, in a kernel with CONFIG_HOTPLUG_CPU=y but not CONFIG_CPUMASK_OFFSTACK
> > I see "nr_cpu_ids = 20, nr_cpumask_bits = 512". FYI since it doesn't
> > match the observation
> > above that nr_cpumask_bits == nr_cpu_ids when CONFIG_HOTPLUG_CPU=y.
>
> I said this because the comment in include/linux/cpumaks.h says:
>  *  If HOTPLUG is enabled, then cpu_possible_mask is forced to have
>  *  all NR_CPUS bits set, otherwise it is just the set of CPUs that
>  *  ACPI reports present at boot.
>
> And because of the code that sets nr_cpu_ids:
>
> void __init setup_nr_cpu_ids(void)
> {
>         nr_cpu_ids = find_last_bit(cpumask_bits(cpu_possible_mask),NR_CPUS) + 1;
> }
>
> Some arches override it, so it may be an issue. Are you running x86,
> or maybe you have "nr_cpus" boot parameter?
>
> But anyways, I feel like this should be investigated for more... Can you
> please share the config of your system and boot params?
>

Yeah, this is a stock defconfig compiled on an x86_64 host and booted
inside a 20 vcpu virtual machine (virtme). There are no command line
params to modify the number of cpus.

I think everything is working as expected based on some debug prints I
added during boot:
[    0.641798] smp: setup_nr_cpu_ids: nr_cpu_ids 20, cpu_possible_mask 0-19
[    0.648424] setup_percpu: NR_CPUS:64 nr_cpumask_bits:64
nr_cpu_ids:20 nr_node_ids:2

The first one is from setup_nr_cpu_ids() in kernel/smp.c. The second
one is from setup_per_cpu_areas() from arch/x86/setup_percpu.c.

> > > In case of HOTPLUG is off, I don't understand why we should have nr_cpu_ids
> > > and nr_cpumask_bits different - what case should it cover?... Interestingly, in
> > > comments to cpumask functions and in the code those two are referred
> > > interchangeably.
> > >
> > > Even more interestingly, we have a function bitmap_setall() that sets all bits
> > > up to nr_cpumask_bits, and it could trigger the problem that you described,
> >
> > I think you mean cpumask_setall() that in turn calls
> > bitmap_fill(nr_cpumask_bits)?
>
> Yes, sorry.
>
> > > so that someone would complain. (Are there any other valid reasons to set
> > > bits behind nr_cpu_ids intentionally?)
> > >
> >
> > I don't know of any although this wasn't the case that trigger in my case.
> >
> > > Can you share more details about how you triggered that? If you observe
> > > those bits set, something else is probably already wrong...
> >
> > The non-intuitive behavior of for_each_cpu_wrap() was triggered when iterating
> > over a cpumask passed by userspace that set a bit in the [nr_cpu_ids,
> > nr_cpumask_bits)
> > range.
>
> If you receive bitmap from userspace, you need to copy it with
> bitmap_copy_clear_tail(), or bitmap_from_arr{32,64}, as appropriate.
> That will clear unneeded bits.
>
> For user-to-kernel communications, I'd recommend passing bitmaps in a
> human-readable formats - hex string or bitmap list. Check the functions
> cpumask_parse_user() and cpumask_parselist_user(). This would help to
> avoid all possible issues related to endianness and 32/64 word length
> mismatch.
>
> > The kernel code is out of tree but open source so happy to provide a
> > pointer if needed.
>
> Yes please
>

Here is where we copy the user supplied cpumask using the
get_user_cpu_mask() helper:
https://github.com/google/ghost-kernel/blob/c21b36f87663efa2189876b2caa701fe74e72adf/kernel/sched/ghost.c#L5729

For performance reasons we cannot use human readable cpu masks in this
code path.

Note that the helper copies up to 'nr_cpumask_bits' which in some
kernels (!CONFIG_CPUMASK_OFFSTACK) can copy bits beyond 'nr_cpu_ids'.
A possible option could be to fix this helper but I do feel that
for_each_cpu() and for_each_cpu_wrap() should visit the same set of
cpus given the same cpumask (ordering can be different but the set of
cpus should remain the same).

What do you think?

best
Neel

> > I considered ANDing the user supplied mask with 'cpu_possible_mask'
> > but that felt
> > like working around an inconsistency in the kernel API (hence the proposed fix).
> >
> > > So, if there is no real case in modern kernel to have nr_cpumask_bits and
> > > nr_cpu_ids different, the proper fix would be to just drop the first.
> > >
> >
> > I'll let other people more knowledgable than me in this area chime in.
> > I'll be happy either
> > way if that fixes the problem at hand.
> >
> > best
> > Neel
> >
> > > If there is such a case, this is probably your case, and we'd know more
> > > details to understand how to deal with that.
> > >
> > > Thanks,
> > > Yury

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ