[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250930063921.62354-1-acsjakub@amazon.de>
Date: Tue, 30 Sep 2025 06:39:21 +0000
From: Jakub Acs <acsjakub@...zon.de>
To: <linux-mm@...ck.org>
CC: <acsjakub@...zon.de>, Andrew Morton <akpm@...ux-foundation.org>, "David
Hildenbrand" <david@...hat.com>, Xu Xin <xu.xin16@....com.cn>, Chengming Zhou
<chengming.zhou@...ux.dev>, Peter Xu <peterx@...hat.com>, Axel Rasmussen
<axelrasmussen@...gle.com>, Mike Kravetz <mike.kravetz@...cle.com>,
<linux-kernel@...r.kernel.org>, <stable@...r.kernel.org>
Subject: [PATCH] mm/ksm: fix flag-dropping behavior in ksm_madvise
syzkaller discovered the following crash: (kernel BUG)
[ 44.607039] ------------[ cut here ]------------
[ 44.607422] kernel BUG at mm/userfaultfd.c:2067!
[ 44.608148] Oops: invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN NOPTI
[ 44.608814] CPU: 1 UID: 0 PID: 2475 Comm: reproducer Not tainted 6.16.0-rc6 #1 PREEMPT(none)
[ 44.609635] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014
[ 44.610695] RIP: 0010:userfaultfd_release_all+0x3a8/0x460
<snip other registers, drop unreliable trace>
[ 44.617726] Call Trace:
[ 44.617926] <TASK>
[ 44.619284] userfaultfd_release+0xef/0x1b0
[ 44.620976] __fput+0x3f9/0xb60
[ 44.621240] fput_close_sync+0x110/0x210
[ 44.622222] __x64_sys_close+0x8f/0x120
[ 44.622530] do_syscall_64+0x5b/0x2f0
[ 44.622840] entry_SYSCALL_64_after_hwframe+0x76/0x7e
[ 44.623244] RIP: 0033:0x7f365bb3f227
Kernel panics because it detects UFFD inconsistency during
userfaultfd_release_all(). Specifically, a VMA which has a valid pointer
to vma->vm_userfaultfd_ctx, but no UFFD flags in vma->vm_flags.
The inconsistency is caused in ksm_madvise(): when user calls madvise()
with MADV_UNMEARGEABLE on a VMA that is registered for UFFD in MINOR
mode, it accidentally clears all flags stored in the upper 32 bits of
vma->vm_flags.
Assuming x86_64 kernel build, unsigned long is 64-bit and unsigned int
and int are 32-bit wide. This setup causes the following mishap during
the &= ~VM_MERGEABLE assignment.
VM_MERGEABLE is a 32-bit constant of type unsigned int, 0x8000'0000.
After ~ is applied, it becomes 0x7fff'ffff unsigned int, which is then
promoted to unsigned long before the & operation. This promotion fills
upper 32 bits with leading 0s, as we're doing unsigned conversion (and
even for a signed conversion, this wouldn't help as the leading bit is
0). & operation thus ends up AND-ing vm_flags with 0x0000'0000'7fff'ffff
instead of intended 0xffff'ffff'7fff'ffff and hence accidentally clears
the upper 32-bits of its value.
Fix it by casting `VM_MERGEABLE` constant to unsigned long to preserve
the upper 32 bits, in case it's needed.
Note: other VM_* flags are not affected:
This only happens to the VM_MERGEABLE flag, as the other VM_* flags are
all constants of type int and after ~ operation, they end up with
leading 1 and are thus converted to unsigned long with leading 1s.
Note 2:
After commit 31defc3b01d9 ("userfaultfd: remove (VM_)BUG_ON()s"), this is
no longer a kernel BUG, but a WARNING at the same place:
[ 45.595973] WARNING: CPU: 1 PID: 2474 at mm/userfaultfd.c:2067
but the root-cause (flag-drop) remains the same.
Fixes: 7677f7fd8be76 ("userfaultfd: add minor fault registration mode")
Signed-off-by: Jakub Acs <acsjakub@...zon.de>
Cc: Andrew Morton <akpm@...ux-foundation.org>
Cc: David Hildenbrand <david@...hat.com>
Cc: Xu Xin <xu.xin16@....com.cn>
Cc: Chengming Zhou <chengming.zhou@...ux.dev>
Cc: Peter Xu <peterx@...hat.com>
Cc: Axel Rasmussen <axelrasmussen@...gle.com>
Cc: Mike Kravetz <mike.kravetz@...cle.com>
Cc: linux-mm@...ck.org
Cc: linux-kernel@...r.kernel.org
Cc: stable@...r.kernel.org
---
I looked around the kernel and found one more flag that might be
causing similar issues: "IORESOURCE_BUSY" - as its inverted version is
bit-anded to unsigned long fields. However, it seems those fields don't
actually use any bits from upper 32-bits as flags (yet?).
I also considered changing the constant definition by adding ULL, but am
not sure where else that could blow up, plus it would likely call to
define all the related constants as ULL for consistency. If you'd prefer
that fix, let me know.
mm/ksm.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/mm/ksm.c b/mm/ksm.c
index 160787bb121c..c24137a1eeb7 100644
--- a/mm/ksm.c
+++ b/mm/ksm.c
@@ -2871,7 +2871,7 @@ int ksm_madvise(struct vm_area_struct *vma, unsigned long start,
return err;
}
- *vm_flags &= ~VM_MERGEABLE;
+ *vm_flags &= ~((unsigned long) VM_MERGEABLE);
break;
}
--
2.47.3
Amazon Web Services Development Center Germany GmbH
Tamara-Danz-Str. 13
10243 Berlin
Geschaeftsfuehrung: Christian Schlaeger
Eingetragen am Amtsgericht Charlottenburg unter HRB 257764 B
Sitz: Berlin
Ust-ID: DE 365 538 597
Powered by blists - more mailing lists