lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <461d3a9e-f3e2-4c2f-ac9b-2b842ce115fd@suse.cz>
Date: Mon, 10 Nov 2025 11:00:22 +0100
From: Vlastimil Babka <vbabka@...e.cz>
To: Jakub Acs <acsjakub@...zon.de>, linux-mm@...ck.org,
 Hugh Dickins <hughd@...gle.com>, Jann Horn <jannh@...gle.com>,
 Lorenzo Stoakes <lorenzo.stoakes@...cle.com>,
 Dave Hansen <dave.hansen@...ux.intel.com>
Cc: akpm@...ux-foundation.org, david@...hat.com, xu.xin16@....com.cn,
 chengming.zhou@...ux.dev, peterx@...hat.com, axelrasmussen@...gle.com,
 linux-kernel@...r.kernel.org, stable@...r.kernel.org
Subject: Re: [PATCH v3 1/2] mm/ksm: fix flag-dropping behavior in ksm_madvise

On 11/6/25 11:39, Vlastimil Babka wrote:
> On 10/1/25 11:03, Jakub Acs wrote:
>> syzkaller discovered the following crash: (kernel BUG)
>> 
>> [   44.607039] ------------[ cut here ]------------
>> [   44.607422] kernel BUG at mm/userfaultfd.c:2067!
>> [   44.608148] Oops: invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN NOPTI
>> [   44.608814] CPU: 1 UID: 0 PID: 2475 Comm: reproducer Not tainted 6.16.0-rc6 #1 PREEMPT(none)
>> [   44.609635] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014
>> [   44.610695] RIP: 0010:userfaultfd_release_all+0x3a8/0x460
>> 
>> <snip other registers, drop unreliable trace>
>> 
>> [   44.617726] Call Trace:
>> [   44.617926]  <TASK>
>> [   44.619284]  userfaultfd_release+0xef/0x1b0
>> [   44.620976]  __fput+0x3f9/0xb60
>> [   44.621240]  fput_close_sync+0x110/0x210
>> [   44.622222]  __x64_sys_close+0x8f/0x120
>> [   44.622530]  do_syscall_64+0x5b/0x2f0
>> [   44.622840]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
>> [   44.623244] RIP: 0033:0x7f365bb3f227
>> 
>> Kernel panics because it detects UFFD inconsistency during
>> userfaultfd_release_all(). Specifically, a VMA which has a valid pointer
>> to vma->vm_userfaultfd_ctx, but no UFFD flags in vma->vm_flags.
>> 
>> The inconsistency is caused in ksm_madvise(): when user calls madvise()
>> with MADV_UNMEARGEABLE on a VMA that is registered for UFFD in MINOR
>> mode, it accidentally clears all flags stored in the upper 32 bits of
>> vma->vm_flags.
>> 
>> Assuming x86_64 kernel build, unsigned long is 64-bit and unsigned int
>> and int are 32-bit wide. This setup causes the following mishap during
>> the &= ~VM_MERGEABLE assignment.
>> 
>> VM_MERGEABLE is a 32-bit constant of type unsigned int, 0x8000'0000.
>> After ~ is applied, it becomes 0x7fff'ffff unsigned int, which is then
>> promoted to unsigned long before the & operation. This promotion fills
>> upper 32 bits with leading 0s, as we're doing unsigned conversion (and
>> even for a signed conversion, this wouldn't help as the leading bit is
>> 0). & operation thus ends up AND-ing vm_flags with 0x0000'0000'7fff'ffff
>> instead of intended 0xffff'ffff'7fff'ffff and hence accidentally clears
>> the upper 32-bits of its value.
>> 
>> Fix it by changing `VM_MERGEABLE` constant to unsigned long, using the
>> BIT() macro.
>> 
>> Note: other VM_* flags are not affected:
>> This only happens to the VM_MERGEABLE flag, as the other VM_* flags are
>> all constants of type int and after ~ operation, they end up with
>> leading 1 and are thus converted to unsigned long with leading 1s.
>> 
>> Note 2:
>> After commit 31defc3b01d9 ("userfaultfd: remove (VM_)BUG_ON()s"), this is
>> no longer a kernel BUG, but a WARNING at the same place:
>> 
>> [   45.595973] WARNING: CPU: 1 PID: 2474 at mm/userfaultfd.c:2067
>> 
>> but the root-cause (flag-drop) remains the same.
>> 
>> Fixes: 7677f7fd8be76 ("userfaultfd: add minor fault registration mode")
> 
> Late to the party, but it seems to me the correct Fixes: should be
> f8af4da3b4c1 ("ksm: the mm interface to ksm")
> which introduced the flag and the buggy clearing code, no?

Clarification: flags with bits >31 did not exist at the time of f8af4da3b4c1
as they were only introduced later with 63c17fb8e5a4 ("mm/core,
x86/mm/pkeys: Store protection bits in high VMA flags") (v4.6) so that would
have been the most precise Fixes: commit. Sorry, Hugh :)

But that doesn't affect the stable backports efforts where the oldest LTS is
5.4 anyway.

> Commit 7677f7fd8be76 is just one that notices it, right? But there are other
> flags in >32 bit area, including pkeys etc. Sounds rather dangerous if they
> can be cleared using a madvise.
> 
> So we can't amend the Fixes: now but maybe could advise stable to backport
> for even older versions than based on 7677f7fd8be76 ?
> 
>> Signed-off-by: Jakub Acs <acsjakub@...zon.de>
>> Cc: Andrew Morton <akpm@...ux-foundation.org>
>> Cc: David Hildenbrand <david@...hat.com>
>> Cc: Xu Xin <xu.xin16@....com.cn>
>> Cc: Chengming Zhou <chengming.zhou@...ux.dev>
>> Cc: Peter Xu <peterx@...hat.com>
>> Cc: Axel Rasmussen <axelrasmussen@...gle.com>
>> Cc: linux-mm@...ck.org
>> Cc: linux-kernel@...r.kernel.org
>> Cc: stable@...r.kernel.org
>> ---
>>  include/linux/mm.h | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>> 
>> diff --git a/include/linux/mm.h b/include/linux/mm.h
>> index 1ae97a0b8ec7..c6794d0e24eb 100644
>> --- a/include/linux/mm.h
>> +++ b/include/linux/mm.h
>> @@ -296,7 +296,7 @@ extern unsigned int kobjsize(const void *objp);
>>  #define VM_MIXEDMAP	0x10000000	/* Can contain "struct page" and pure PFN pages */
>>  #define VM_HUGEPAGE	0x20000000	/* MADV_HUGEPAGE marked this vma */
>>  #define VM_NOHUGEPAGE	0x40000000	/* MADV_NOHUGEPAGE marked this vma */
>> -#define VM_MERGEABLE	0x80000000	/* KSM may merge identical pages */
>> +#define VM_MERGEABLE	BIT(31)		/* KSM may merge identical pages */
>>  
>>  #ifdef CONFIG_ARCH_USES_HIGH_VMA_FLAGS
>>  #define VM_HIGH_ARCH_BIT_0	32	/* bit only usable on 64-bit architectures */
> 


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ