lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAPTQFZSgNHEE0Ub17=kfF-W64bbfRc4wYijTkG==+XxfgcocOQ@mail.gmail.com>
Date: Tue, 23 Jul 2024 09:33:44 -0700
From: Jerome Glisse <jglisse@...gle.com>
To: David Hildenbrand <david@...hat.com>
Cc: Andrew Morton <akpm@...ux-foundation.org>, linux-mm@...ck.org, 
	linux-kernel@...r.kernel.org, stable@...r.kernel.org
Subject: Re: [PATCH] mm: fix maxnode for mbind(), set_mempolicy() and migrate_pages()

On Mon, 22 Jul 2024 at 06:09, David Hildenbrand <david@...hat.com> wrote:
>
> On 20.07.24 19:35, Jerome Glisse wrote:
> > Because maxnode bug there is no way to bind or migrate_pages to the
> > last node in multi-node NUMA system unless you lie about maxnodes
> > when making the mbind, set_mempolicy or migrate_pages syscall.
> >
> > Manpage for those syscall describe maxnodes as the number of bits in
> > the node bitmap ("bit mask of nodes containing up to maxnode bits").
> > Thus if maxnode is n then we expect to have a n bit(s) bitmap which
> > means that the mask of valid bits is ((1 << n) - 1). The get_nodes()
> > decrement lead to the mask being ((1 << (n - 1)) - 1).
> >
> > The three syscalls use a common helper get_nodes() and first things
> > this helper do is decrement maxnode by 1 which leads to using n-1 bits
> > in the provided mask of nodes (see get_bitmap() an helper function to
> > get_nodes()).
> >
> > The lead to two bugs, either the last node in the bitmap provided will
> > not be use in either of the three syscalls, or the syscalls will error
> > out and return EINVAL if the only bit set in the bitmap was the last
> > bit in the mask of nodes (which is ignored because of the bug and an
> > empty mask of nodes is an invalid argument).
> >
> > I am surprised this bug was never caught ... it has been in the kernel
> > since forever.
>
> Let's look at QEMU: backends/hostmem.c
>
>      /*
>       * We can have up to MAX_NODES nodes, but we need to pass maxnode+1
>       * as argument to mbind() due to an old Linux bug (feature?) which
>       * cuts off the last specified node. This means backend->host_nodes
>       * must have MAX_NODES+1 bits available.
>       */
>
> Which means that it's been known for a long time, and the workaround
> seems to be pretty easy.
>
> So I wonder if we rather want to update the documentation to match reality.

[Sorry resending as text ... gmail insanity]

I think it is kind of weird if we ask to supply maxnodes+1 to work
around the bug. If we apply this patch qemu would continue to work as
is while fixing users that were not aware of that bug. So I would say
applying this patch does more good. Long term qemu can drop its
workaround or keep it for backward compatibility with old kernel.

Thank you,
Jérôme

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ