lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <85f852f9-8577-4230-adc7-c52e7f479454@redhat.com>
Date: Tue, 30 Sep 2025 17:32:25 +0200
From: David Hildenbrand <david@...hat.com>
To: Jakub Acs <acsjakub@...zon.de>, linux-mm@...ck.org
Cc: Andrew Morton <akpm@...ux-foundation.org>, Xu Xin <xu.xin16@....com.cn>,
 Chengming Zhou <chengming.zhou@...ux.dev>, Peter Xu <peterx@...hat.com>,
 Axel Rasmussen <axelrasmussen@...gle.com>, linux-kernel@...r.kernel.org,
 stable@...r.kernel.org
Subject: Re: [PATCH v2] mm/ksm: fix flag-dropping behavior in ksm_madvise

On 30.09.25 15:00, Jakub Acs wrote:
> syzkaller discovered the following crash: (kernel BUG)
> 
> [   44.607039] ------------[ cut here ]------------
> [   44.607422] kernel BUG at mm/userfaultfd.c:2067!
> [   44.608148] Oops: invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN NOPTI
> [   44.608814] CPU: 1 UID: 0 PID: 2475 Comm: reproducer Not tainted 6.16.0-rc6 #1 PREEMPT(none)
> [   44.609635] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014
> [   44.610695] RIP: 0010:userfaultfd_release_all+0x3a8/0x460
> 
> <snip other registers, drop unreliable trace>
> 
> [   44.617726] Call Trace:
> [   44.617926]  <TASK>
> [   44.619284]  userfaultfd_release+0xef/0x1b0
> [   44.620976]  __fput+0x3f9/0xb60
> [   44.621240]  fput_close_sync+0x110/0x210
> [   44.622222]  __x64_sys_close+0x8f/0x120
> [   44.622530]  do_syscall_64+0x5b/0x2f0
> [   44.622840]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
> [   44.623244] RIP: 0033:0x7f365bb3f227
> 
> Kernel panics because it detects UFFD inconsistency during
> userfaultfd_release_all(). Specifically, a VMA which has a valid pointer
> to vma->vm_userfaultfd_ctx, but no UFFD flags in vma->vm_flags.
> 
> The inconsistency is caused in ksm_madvise(): when user calls madvise()
> with MADV_UNMEARGEABLE on a VMA that is registered for UFFD in MINOR
> mode, it accidentally clears all flags stored in the upper 32 bits of
> vma->vm_flags.
> 
> Assuming x86_64 kernel build, unsigned long is 64-bit and unsigned int
> and int are 32-bit wide. This setup causes the following mishap during
> the &= ~VM_MERGEABLE assignment.
> 
> VM_MERGEABLE is a 32-bit constant of type unsigned int, 0x8000'0000.
> After ~ is applied, it becomes 0x7fff'ffff unsigned int, which is then
> promoted to unsigned long before the & operation. This promotion fills
> upper 32 bits with leading 0s, as we're doing unsigned conversion (and
> even for a signed conversion, this wouldn't help as the leading bit is
> 0). & operation thus ends up AND-ing vm_flags with 0x0000'0000'7fff'ffff
> instead of intended 0xffff'ffff'7fff'ffff and hence accidentally clears
> the upper 32-bits of its value.
> 
> Fix it by changing `VM_MERGEABLE` constant to unsigned long. Modify all
> other VM_* flags constants for consistency.
> 
> Note: other VM_* flags are not affected:
> This only happens to the VM_MERGEABLE flag, as the other VM_* flags are
> all constants of type int and after ~ operation, they end up with
> leading 1 and are thus converted to unsigned long with leading 1s.
> 
> Note 2:
> After commit 31defc3b01d9 ("userfaultfd: remove (VM_)BUG_ON()s"), this is
> no longer a kernel BUG, but a WARNING at the same place:
> 
> [   45.595973] WARNING: CPU: 1 PID: 2474 at mm/userfaultfd.c:2067
> 
> but the root-cause (flag-drop) remains the same.
> 
> Fixes: 7677f7fd8be76 ("userfaultfd: add minor fault registration mode")
> Signed-off-by: Jakub Acs <acsjakub@...zon.de>
> Cc: Andrew Morton <akpm@...ux-foundation.org>
> Cc: David Hildenbrand <david@...hat.com>
> Cc: Xu Xin <xu.xin16@....com.cn>
> Cc: Chengming Zhou <chengming.zhou@...ux.dev>
> Cc: Peter Xu <peterx@...hat.com>
> Cc: Axel Rasmussen <axelrasmussen@...gle.com>
> Cc: linux-mm@...ck.org
> Cc: linux-kernel@...r.kernel.org
> Cc: stable@...r.kernel.org
> ---

If we want a smaller patch for easier backporting, we could split off 
the VM_MERGEABLE change into a separate patch and do all the other ones 
for consistency in another

Reading what we do VM_HIGH_ARCH_BIT_* , we use BIT(), which does

	#define BIT(nr)		(UL(1) << (nr))

So likely we should just clean it all up an use e.g.,

#define VM_NONE		0
#define VM_READ		BIT(0)
#define VM_WRITE	BIT(1)

etc.

So likely it's best to do in a first fix
	#define VM_MERGEABLE	BIT(31)

And in a follow-up cleanup patch convert all the other ones.

Sorry for not thinking about BIT() earlier

-- 
Cheers

David / dhildenb


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ