linux-kernel - Re: [PATCH v1] mm/mremap: fix WARN with uffd that has remap events disabled

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <4d00f16d-1e8f-48bf-8fcd-83f827c9928e@lucifer.local>
Date: Wed, 20 Aug 2025 10:14:11 +0100
From: Lorenzo Stoakes <lorenzo.stoakes@...cle.com>
To: David Hildenbrand <david@...hat.com>
Cc: linux-kernel@...r.kernel.org, linux-mm@...ck.org,
        syzbot+4d9a13f0797c46a29e42@...kaller.appspotmail.com,
        stable@...nel.org, Andrew Morton <akpm@...ux-foundation.org>,
        "Liam R. Howlett" <Liam.Howlett@...cle.com>,
        Vlastimil Babka <vbabka@...e.cz>, Jann Horn <jannh@...gle.com>,
        Pedro Falcato <pfalcato@...e.de>, Harry Yoo <harry.yoo@...cle.com>
Subject: Re: [PATCH v1] mm/mremap: fix WARN with uffd that has remap events
 disabled

On Mon, Aug 18, 2025 at 07:53:58PM +0200, David Hildenbrand wrote:
> Registering userfaultd on a VMA that spans at least one PMD and then
> mremap()'ing that VMA can trigger a WARN when recovering from a failed
> page table move due to a page table allocation error.
>

Am guessing this is a fault injection thing?

I mean it's good that _in practice_ it's almost impossible to observe this,
though it doesn't mean we don't need to fix it.

> The code ends up doing the right thing (recurse, avoiding moving actual
> page tables), but triggering that WARN is unpleasant:
>
> WARNING: CPU: 2 PID: 6133 at mm/mremap.c:357 move_normal_pmd mm/mremap.c:357 [inline]
> WARNING: CPU: 2 PID: 6133 at mm/mremap.c:357 move_pgt_entry mm/mremap.c:595 [inline]
> WARNING: CPU: 2 PID: 6133 at mm/mremap.c:357 move_page_tables+0x3832/0x44a0 mm/mremap.c:852
> Modules linked in:
> CPU: 2 UID: 0 PID: 6133 Comm: syz.0.19 Not tainted 6.17.0-rc1-syzkaller-00004-g53e760d89498 #0 PREEMPT(full)
> Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2~bpo12+1 04/01/2014
> RIP: 0010:move_normal_pmd mm/mremap.c:357 [inline]
> RIP: 0010:move_pgt_entry mm/mremap.c:595 [inline]
> RIP: 0010:move_page_tables+0x3832/0x44a0 mm/mremap.c:852
> Code: ...
> RSP: 0018:ffffc900037a76d8 EFLAGS: 00010293
> RAX: 0000000000000000 RBX: 0000000032930007 RCX: ffffffff820c6645
> RDX: ffff88802e56a440 RSI: ffffffff820c7201 RDI: 0000000000000007
> RBP: ffff888037728fc0 R08: 0000000000000007 R09: 0000000000000000
> R10: 0000000032930007 R11: 0000000000000000 R12: 0000000000000000
> R13: ffffc900037a79a8 R14: 0000000000000001 R15: dffffc0000000000
> FS:  000055556316a500(0000) GS:ffff8880d68bc000(0000) knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 0000001b30863fff CR3: 0000000050171000 CR4: 0000000000352ef0
> Call Trace:
>  <TASK>
>  copy_vma_and_data+0x468/0x790 mm/mremap.c:1215
>  move_vma+0x548/0x1780 mm/mremap.c:1282
>  mremap_to+0x1b7/0x450 mm/mremap.c:1406
>  do_mremap+0xfad/0x1f80 mm/mremap.c:1921
>  __do_sys_mremap+0x119/0x170 mm/mremap.c:1977
>  do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
>  do_syscall_64+0xcd/0x4c0 arch/x86/entry/syscall_64.c:94
>  entry_SYSCALL_64_after_hwframe+0x77/0x7f
> RIP: 0033:0x7f00d0b8ebe9
> Code: ...
> RSP: 002b:00007ffe5ea5ee98 EFLAGS: 00000246 ORIG_RAX: 0000000000000019
> RAX: ffffffffffffffda RBX: 00007f00d0db5fa0 RCX: 00007f00d0b8ebe9
> RDX: 0000000000400000 RSI: 0000000000c00000 RDI: 0000200000000000
> RBP: 00007ffe5ea5eef0 R08: 0000200000c00000 R09: 0000000000000000
> R10: 0000000000000003 R11: 0000000000000246 R12: 0000000000000002
> R13: 00007f00d0db5fa0 R14: 00007f00d0db5fa0 R15: 0000000000000005
>  </TASK>
>
> The underlying issue is that we recurse during the original page table
> move, but not during the recovery move.
>
> Fix it by checking for both VMAs and performing the check before the
> pmd_none() sanity check.
>
> Add a new helper where we perform+document that check for the PMD and
> PUD level.
>
> Thanks to Harry for bisecting.
>
> Reported-by: syzbot+4d9a13f0797c46a29e42@...kaller.appspotmail.com
> Closes: https://lkml.kernel.org/r/689bb893.050a0220.7f033.013a.GAE@google.com
> Fixes: 0cef0bb836e ("mm: clear uffd-wp PTE/PMD state on mremap()")
> Cc: <stable@...nel.org>
> Cc: Andrew Morton <akpm@...ux-foundation.org>
> Cc: "Liam R. Howlett" <Liam.Howlett@...cle.com>
> Cc: Lorenzo Stoakes <lorenzo.stoakes@...cle.com>
> Cc: Vlastimil Babka <vbabka@...e.cz>
> Cc: Jann Horn <jannh@...gle.com>
> Cc: Pedro Falcato <pfalcato@...e.de>
> Cc: Harry Yoo <harry.yoo@...cle.com>
> Signed-off-by: David Hildenbrand <david@...hat.com>

This is great work!

And major thanks to Harry for working hard on + pushing this, we chatted
off-list first and I asked him to look deeper into it and he was able to
correctly identify that the optimise-mremap bug recently fixed was not the
issue (though syzbot didn't help with a flakey 'fixed' repro).

Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@...cle.com>

> ---
>
> Once this is queued, I'll send a patch to perform a
> userfaultfd_wp() check, to skip any uffd VMAs that don't mess with uffd-wp.

OK this makes sense.

>
> ---
>  mm/mremap.c | 41 +++++++++++++++++++++++------------------
>  1 file changed, 23 insertions(+), 18 deletions(-)
>
> diff --git a/mm/mremap.c b/mm/mremap.c
> index 9afa8cd524f5f..87849af6682a7 100644
> --- a/mm/mremap.c
> +++ b/mm/mremap.c
> @@ -323,6 +323,25 @@ static inline bool arch_supports_page_table_move(void)
>  }
>  #endif
>
> +static inline bool uffd_supports_page_table_move(struct pagetable_move_control *pmc)
> +{
> +	/*
> +	 * If we are moving a VMA that has uffd-wp registered but with
> +	 * remap events disabled (new VMA will not be registered with uffd), we
> +	 * need to ensure that the uffd-wp state is cleared from all pgtables.
> +	 * This means recursing into lower page tables in move_page_tables().
> +	 *
> +	 * We might get called with VMAs reversed when recovering from a
> +	 * failed page table move. In that case, the
> +	 * "old"-but-actually-"originally new" VMA during recovery will not have
> +	 * a uffd context. Recursing into lower page tables during the original
> +	 * move but not during the recovery move will cause trouble, because we
> +	 * run into already-existing page tables. So check both VMAs.
> +	 */
> +	return !vma_has_uffd_without_event_remap(pmc->old) &&
> +	       !vma_has_uffd_without_event_remap(pmc->new);
> +}
> +
>  #ifdef CONFIG_HAVE_MOVE_PMD
>  static bool move_normal_pmd(struct pagetable_move_control *pmc,
>  			pmd_t *old_pmd, pmd_t *new_pmd)
> @@ -335,6 +354,8 @@ static bool move_normal_pmd(struct pagetable_move_control *pmc,
>
>  	if (!arch_supports_page_table_move())
>  		return false;
> +	if (!uffd_supports_page_table_move(pmc))
> +		return false;

OK I guess this is inapplicable to move_huge_pmd() because it doesn't need
to recurse PTE page tables.


>  	/*
>  	 * The destination pmd shouldn't be established, free_pgtables()
>  	 * should have released it.
> @@ -361,15 +382,6 @@ static bool move_normal_pmd(struct pagetable_move_control *pmc,
>  	if (WARN_ON_ONCE(!pmd_none(*new_pmd)))
>  		return false;
>
> -	/* If this pmd belongs to a uffd vma with remap events disabled, we need
> -	 * to ensure that the uffd-wp state is cleared from all pgtables. This
> -	 * means recursing into lower page tables in move_page_tables(), and we
> -	 * can reuse the existing code if we simply treat the entry as "not
> -	 * moved".
> -	 */
> -	if (vma_has_uffd_without_event_remap(vma))
> -		return false;
> -
>  	/*
>  	 * We don't have to worry about the ordering of src and dst
>  	 * ptlocks because exclusive mmap_lock prevents deadlock.
> @@ -418,6 +430,8 @@ static bool move_normal_pud(struct pagetable_move_control *pmc,
>
>  	if (!arch_supports_page_table_move())
>  		return false;
> +	if (!uffd_supports_page_table_move(pmc))
> +		return false;
>  	/*
>  	 * The destination pud shouldn't be established, free_pgtables()
>  	 * should have released it.
> @@ -425,15 +439,6 @@ static bool move_normal_pud(struct pagetable_move_control *pmc,
>  	if (WARN_ON_ONCE(!pud_none(*new_pud)))
>  		return false;
>
> -	/* If this pud belongs to a uffd vma with remap events disabled, we need
> -	 * to ensure that the uffd-wp state is cleared from all pgtables. This
> -	 * means recursing into lower page tables in move_page_tables(), and we
> -	 * can reuse the existing code if we simply treat the entry as "not
> -	 * moved".
> -	 */
> -	if (vma_has_uffd_without_event_remap(vma))
> -		return false;
> -
>  	/*
>  	 * We don't have to worry about the ordering of src and dst
>  	 * ptlocks because exclusive mmap_lock prevents deadlock.
>
> base-commit: dfc0f6373094dd88e1eaf76c44f2ff01b65db851
> --
> 2.50.1
>