[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <4d00f16d-1e8f-48bf-8fcd-83f827c9928e@lucifer.local>
Date: Wed, 20 Aug 2025 10:14:11 +0100
From: Lorenzo Stoakes <lorenzo.stoakes@...cle.com>
To: David Hildenbrand <david@...hat.com>
Cc: linux-kernel@...r.kernel.org, linux-mm@...ck.org,
syzbot+4d9a13f0797c46a29e42@...kaller.appspotmail.com,
stable@...nel.org, Andrew Morton <akpm@...ux-foundation.org>,
"Liam R. Howlett" <Liam.Howlett@...cle.com>,
Vlastimil Babka <vbabka@...e.cz>, Jann Horn <jannh@...gle.com>,
Pedro Falcato <pfalcato@...e.de>, Harry Yoo <harry.yoo@...cle.com>
Subject: Re: [PATCH v1] mm/mremap: fix WARN with uffd that has remap events
disabled
On Mon, Aug 18, 2025 at 07:53:58PM +0200, David Hildenbrand wrote:
> Registering userfaultd on a VMA that spans at least one PMD and then
> mremap()'ing that VMA can trigger a WARN when recovering from a failed
> page table move due to a page table allocation error.
>
Am guessing this is a fault injection thing?
I mean it's good that _in practice_ it's almost impossible to observe this,
though it doesn't mean we don't need to fix it.
> The code ends up doing the right thing (recurse, avoiding moving actual
> page tables), but triggering that WARN is unpleasant:
>
> WARNING: CPU: 2 PID: 6133 at mm/mremap.c:357 move_normal_pmd mm/mremap.c:357 [inline]
> WARNING: CPU: 2 PID: 6133 at mm/mremap.c:357 move_pgt_entry mm/mremap.c:595 [inline]
> WARNING: CPU: 2 PID: 6133 at mm/mremap.c:357 move_page_tables+0x3832/0x44a0 mm/mremap.c:852
> Modules linked in:
> CPU: 2 UID: 0 PID: 6133 Comm: syz.0.19 Not tainted 6.17.0-rc1-syzkaller-00004-g53e760d89498 #0 PREEMPT(full)
> Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2~bpo12+1 04/01/2014
> RIP: 0010:move_normal_pmd mm/mremap.c:357 [inline]
> RIP: 0010:move_pgt_entry mm/mremap.c:595 [inline]
> RIP: 0010:move_page_tables+0x3832/0x44a0 mm/mremap.c:852
> Code: ...
> RSP: 0018:ffffc900037a76d8 EFLAGS: 00010293
> RAX: 0000000000000000 RBX: 0000000032930007 RCX: ffffffff820c6645
> RDX: ffff88802e56a440 RSI: ffffffff820c7201 RDI: 0000000000000007
> RBP: ffff888037728fc0 R08: 0000000000000007 R09: 0000000000000000
> R10: 0000000032930007 R11: 0000000000000000 R12: 0000000000000000
> R13: ffffc900037a79a8 R14: 0000000000000001 R15: dffffc0000000000
> FS: 000055556316a500(0000) GS:ffff8880d68bc000(0000) knlGS:0000000000000000
> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 0000001b30863fff CR3: 0000000050171000 CR4: 0000000000352ef0
> Call Trace:
> <TASK>
> copy_vma_and_data+0x468/0x790 mm/mremap.c:1215
> move_vma+0x548/0x1780 mm/mremap.c:1282
> mremap_to+0x1b7/0x450 mm/mremap.c:1406
> do_mremap+0xfad/0x1f80 mm/mremap.c:1921
> __do_sys_mremap+0x119/0x170 mm/mremap.c:1977
> do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
> do_syscall_64+0xcd/0x4c0 arch/x86/entry/syscall_64.c:94
> entry_SYSCALL_64_after_hwframe+0x77/0x7f
> RIP: 0033:0x7f00d0b8ebe9
> Code: ...
> RSP: 002b:00007ffe5ea5ee98 EFLAGS: 00000246 ORIG_RAX: 0000000000000019
> RAX: ffffffffffffffda RBX: 00007f00d0db5fa0 RCX: 00007f00d0b8ebe9
> RDX: 0000000000400000 RSI: 0000000000c00000 RDI: 0000200000000000
> RBP: 00007ffe5ea5eef0 R08: 0000200000c00000 R09: 0000000000000000
> R10: 0000000000000003 R11: 0000000000000246 R12: 0000000000000002
> R13: 00007f00d0db5fa0 R14: 00007f00d0db5fa0 R15: 0000000000000005
> </TASK>
>
> The underlying issue is that we recurse during the original page table
> move, but not during the recovery move.
>
> Fix it by checking for both VMAs and performing the check before the
> pmd_none() sanity check.
>
> Add a new helper where we perform+document that check for the PMD and
> PUD level.
>
> Thanks to Harry for bisecting.
>
> Reported-by: syzbot+4d9a13f0797c46a29e42@...kaller.appspotmail.com
> Closes: https://lkml.kernel.org/r/689bb893.050a0220.7f033.013a.GAE@google.com
> Fixes: 0cef0bb836e ("mm: clear uffd-wp PTE/PMD state on mremap()")
> Cc: <stable@...nel.org>
> Cc: Andrew Morton <akpm@...ux-foundation.org>
> Cc: "Liam R. Howlett" <Liam.Howlett@...cle.com>
> Cc: Lorenzo Stoakes <lorenzo.stoakes@...cle.com>
> Cc: Vlastimil Babka <vbabka@...e.cz>
> Cc: Jann Horn <jannh@...gle.com>
> Cc: Pedro Falcato <pfalcato@...e.de>
> Cc: Harry Yoo <harry.yoo@...cle.com>
> Signed-off-by: David Hildenbrand <david@...hat.com>
This is great work!
And major thanks to Harry for working hard on + pushing this, we chatted
off-list first and I asked him to look deeper into it and he was able to
correctly identify that the optimise-mremap bug recently fixed was not the
issue (though syzbot didn't help with a flakey 'fixed' repro).
Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@...cle.com>
> ---
>
> Once this is queued, I'll send a patch to perform a
> userfaultfd_wp() check, to skip any uffd VMAs that don't mess with uffd-wp.
OK this makes sense.
>
> ---
> mm/mremap.c | 41 +++++++++++++++++++++++------------------
> 1 file changed, 23 insertions(+), 18 deletions(-)
>
> diff --git a/mm/mremap.c b/mm/mremap.c
> index 9afa8cd524f5f..87849af6682a7 100644
> --- a/mm/mremap.c
> +++ b/mm/mremap.c
> @@ -323,6 +323,25 @@ static inline bool arch_supports_page_table_move(void)
> }
> #endif
>
> +static inline bool uffd_supports_page_table_move(struct pagetable_move_control *pmc)
> +{
> + /*
> + * If we are moving a VMA that has uffd-wp registered but with
> + * remap events disabled (new VMA will not be registered with uffd), we
> + * need to ensure that the uffd-wp state is cleared from all pgtables.
> + * This means recursing into lower page tables in move_page_tables().
> + *
> + * We might get called with VMAs reversed when recovering from a
> + * failed page table move. In that case, the
> + * "old"-but-actually-"originally new" VMA during recovery will not have
> + * a uffd context. Recursing into lower page tables during the original
> + * move but not during the recovery move will cause trouble, because we
> + * run into already-existing page tables. So check both VMAs.
> + */
> + return !vma_has_uffd_without_event_remap(pmc->old) &&
> + !vma_has_uffd_without_event_remap(pmc->new);
> +}
> +
> #ifdef CONFIG_HAVE_MOVE_PMD
> static bool move_normal_pmd(struct pagetable_move_control *pmc,
> pmd_t *old_pmd, pmd_t *new_pmd)
> @@ -335,6 +354,8 @@ static bool move_normal_pmd(struct pagetable_move_control *pmc,
>
> if (!arch_supports_page_table_move())
> return false;
> + if (!uffd_supports_page_table_move(pmc))
> + return false;
OK I guess this is inapplicable to move_huge_pmd() because it doesn't need
to recurse PTE page tables.
> /*
> * The destination pmd shouldn't be established, free_pgtables()
> * should have released it.
> @@ -361,15 +382,6 @@ static bool move_normal_pmd(struct pagetable_move_control *pmc,
> if (WARN_ON_ONCE(!pmd_none(*new_pmd)))
> return false;
>
> - /* If this pmd belongs to a uffd vma with remap events disabled, we need
> - * to ensure that the uffd-wp state is cleared from all pgtables. This
> - * means recursing into lower page tables in move_page_tables(), and we
> - * can reuse the existing code if we simply treat the entry as "not
> - * moved".
> - */
> - if (vma_has_uffd_without_event_remap(vma))
> - return false;
> -
> /*
> * We don't have to worry about the ordering of src and dst
> * ptlocks because exclusive mmap_lock prevents deadlock.
> @@ -418,6 +430,8 @@ static bool move_normal_pud(struct pagetable_move_control *pmc,
>
> if (!arch_supports_page_table_move())
> return false;
> + if (!uffd_supports_page_table_move(pmc))
> + return false;
> /*
> * The destination pud shouldn't be established, free_pgtables()
> * should have released it.
> @@ -425,15 +439,6 @@ static bool move_normal_pud(struct pagetable_move_control *pmc,
> if (WARN_ON_ONCE(!pud_none(*new_pud)))
> return false;
>
> - /* If this pud belongs to a uffd vma with remap events disabled, we need
> - * to ensure that the uffd-wp state is cleared from all pgtables. This
> - * means recursing into lower page tables in move_page_tables(), and we
> - * can reuse the existing code if we simply treat the entry as "not
> - * moved".
> - */
> - if (vma_has_uffd_without_event_remap(vma))
> - return false;
> -
> /*
> * We don't have to worry about the ordering of src and dst
> * ptlocks because exclusive mmap_lock prevents deadlock.
>
> base-commit: dfc0f6373094dd88e1eaf76c44f2ff01b65db851
> --
> 2.50.1
>
Powered by blists - more mailing lists