linux-kernel - Re: [PATCH] mm: fix maxnode for mbind(), set_mempolicy() and migrate

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <0c390494-e6ba-4cde-aace-cd726f2409a1@redhat.com>
Date: Mon, 22 Jul 2024 15:09:34 +0200
From: David Hildenbrand <david@...hat.com>
To: Jerome Glisse <jglisse@...gle.com>,
 Andrew Morton <akpm@...ux-foundation.org>, linux-mm@...ck.org
Cc: linux-kernel@...r.kernel.org, stable@...r.kernel.org
Subject: Re: [PATCH] mm: fix maxnode for mbind(), set_mempolicy() and
 migrate_pages()

On 20.07.24 19:35, Jerome Glisse wrote:
> Because maxnode bug there is no way to bind or migrate_pages to the
> last node in multi-node NUMA system unless you lie about maxnodes
> when making the mbind, set_mempolicy or migrate_pages syscall.
> 
> Manpage for those syscall describe maxnodes as the number of bits in
> the node bitmap ("bit mask of nodes containing up to maxnode bits").
> Thus if maxnode is n then we expect to have a n bit(s) bitmap which
> means that the mask of valid bits is ((1 << n) - 1). The get_nodes()
> decrement lead to the mask being ((1 << (n - 1)) - 1).
> 
> The three syscalls use a common helper get_nodes() and first things
> this helper do is decrement maxnode by 1 which leads to using n-1 bits
> in the provided mask of nodes (see get_bitmap() an helper function to
> get_nodes()).
> 
> The lead to two bugs, either the last node in the bitmap provided will
> not be use in either of the three syscalls, or the syscalls will error
> out and return EINVAL if the only bit set in the bitmap was the last
> bit in the mask of nodes (which is ignored because of the bug and an
> empty mask of nodes is an invalid argument).
> 
> I am surprised this bug was never caught ... it has been in the kernel
> since forever.

Let's look at QEMU: backends/hostmem.c

     /*
      * We can have up to MAX_NODES nodes, but we need to pass maxnode+1
      * as argument to mbind() due to an old Linux bug (feature?) which
      * cuts off the last specified node. This means backend->host_nodes
      * must have MAX_NODES+1 bits available.
      */

Which means that it's been known for a long time, and the workaround 
seems to be pretty easy.

So I wonder if we rather want to update the documentation to match reality.

-- 
Cheers,

David / dhildenb