[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <18b18a90-9850-4015-8038-35e2e083ece5@lucifer.local>
Date: Wed, 2 Apr 2025 13:32:55 +0100
From: Lorenzo Stoakes <lorenzo.stoakes@...cle.com>
To: David Hildenbrand <david@...hat.com>
Cc: linux-kernel@...r.kernel.org, linux-mm@...ck.org, x86@...nel.org,
xingwei lee <xrivendell7@...il.com>,
yuxin wang <wang1315768607@....com>,
Marius Fleischer <fleischermarius@...il.com>,
Ingo Molnar <mingo@...nel.org>, Borislav Petkov <bp@...en8.de>,
Dan Carpenter <dan.carpenter@...aro.org>,
Andrew Morton <akpm@...ux-foundation.org>,
Linus Torvalds <torvalds@...ux-foundation.org>,
Dave Hansen <dave.hansen@...ux.intel.com>,
Andy Lutomirski <luto@...nel.org>,
Peter Zijlstra <peterz@...radead.org>, Rik van Riel <riel@...riel.com>,
"H. Peter Anvin" <hpa@...or.com>, Peter Xu <peterx@...hat.com>
Subject: Re: [PATCH v3] x86/mm/pat: Fix VM_PAT handling when fork() fails in
copy_page_range()
For the whole thing with this fix-patch:
Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@...cle.com>
On Wed, Apr 02, 2025 at 01:36:32PM +0200, David Hildenbrand wrote:
> On 25.03.25 20:19, David Hildenbrand wrote:
> > If track_pfn_copy() fails, we already added the dst VMA to the maple
> > tree. As fork() fails, we'll cleanup the maple tree, and stumble over
> > the dst VMA for which we neither performed any reservation nor copied
> > any page tables.
> >
> > Consequently untrack_pfn() will see VM_PAT and try obtaining the
> > PAT information from the page table -- which fails because the page
> > table was not copied.
> >
> > The easiest fix would be to simply clear the VM_PAT flag of the dst VMA
> > if track_pfn_copy() fails. However, the whole thing is about "simply"
> > clearing the VM_PAT flag is shaky as well: if we passed track_pfn_copy()
> > and performed a reservation, but copying the page tables fails, we'll
> > simply clear the VM_PAT flag, not properly undoing the reservation ...
> > which is also wrong.
> >
> > So let's fix it properly: set the VM_PAT flag only if the reservation
> > succeeded (leaving it clear initially), and undo the reservation if
> > anything goes wrong while copying the page tables: clearing the VM_PAT
> > flag after undoing the reservation.
> >
> > Note that any copied page table entries will get zapped when the VMA will
> > get removed later, after copy_page_range() succeeded; as VM_PAT is not set
> > then, we won't try cleaning VM_PAT up once more and untrack_pfn() will be
> > happy. Note that leaving these page tables in place without a reservation
> > is not a problem, as we are aborting fork(); this process will never run.
> >
> > A reproducer can trigger this usually at the first try:
> >
> > https://gitlab.com/davidhildenbrand/scratchspace/-/raw/main/reproducers/pat_fork.c
> >
> > [ 45.239440] WARNING: CPU: 26 PID: 11650 at arch/x86/mm/pat/memtype.c:983 get_pat_info+0xf6/0x110
> > [ 45.241082] Modules linked in: ...
> > [ 45.249119] CPU: 26 UID: 0 PID: 11650 Comm: repro3 Not tainted 6.12.0-rc5+ #92
> > [ 45.250598] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-2.fc40 04/01/2014
> > [ 45.252181] RIP: 0010:get_pat_info+0xf6/0x110
> > ...
> > [ 45.268513] Call Trace:
> > [ 45.269003] <TASK>
> > [ 45.269425] ? __warn.cold+0xb7/0x14d
> > [ 45.270131] ? get_pat_info+0xf6/0x110
> > [ 45.270846] ? report_bug+0xff/0x140
> > [ 45.271519] ? handle_bug+0x58/0x90
> > [ 45.272192] ? exc_invalid_op+0x17/0x70
> > [ 45.272935] ? asm_exc_invalid_op+0x1a/0x20
> > [ 45.273717] ? get_pat_info+0xf6/0x110
> > [ 45.274438] ? get_pat_info+0x71/0x110
> > [ 45.275165] untrack_pfn+0x52/0x110
> > [ 45.275835] unmap_single_vma+0xa6/0xe0
> > [ 45.276549] unmap_vmas+0x105/0x1f0
> > [ 45.277256] exit_mmap+0xf6/0x460
> > [ 45.277913] __mmput+0x4b/0x120
> > [ 45.278512] copy_process+0x1bf6/0x2aa0
> > [ 45.279264] kernel_clone+0xab/0x440
> > [ 45.279959] __do_sys_clone+0x66/0x90
> > [ 45.280650] do_syscall_64+0x95/0x180
> >
> > Likely this case was missed in commit d155df53f310 ("x86/mm/pat: clear
> > VM_PAT if copy_p4d_range failed")
> >
> > ... and instead of undoing the reservation we simply cleared the VM_PAT flag.
> >
> > Keep the documentation of these functions in include/linux/pgtable.h,
> > one place is more than sufficient -- we should clean that up for the other
> > functions like track_pfn_remap/untrack_pfn separately.
> >
> > Reported-by: xingwei lee <xrivendell7@...il.com>
> > Reported-by: yuxin wang <wang1315768607@....com>
> > Closes: https://lore.kernel.org/lkml/CABOYnLx_dnqzpCW99G81DmOr+2UzdmZMk=T3uxwNxwz+R1RAwg@mail.gmail.com/
> > Reported-by: Marius Fleischer <fleischermarius@...il.com>
> > Closes: https://lore.kernel.org/lkml/CAJg=8jwijTP5fre8woS4JVJQ8iUA6v+iNcsOgtj9Zfpc3obDOQ@mail.gmail.com/
> > Fixes: d155df53f310 ("x86/mm/pat: clear VM_PAT if copy_p4d_range failed")
> > Fixes: 2ab640379a0a ("x86: PAT: hooks in generic vm code to help archs to track pfnmap regions - v3")
> > Cc: Ingo Molnar <mingo@...nel.org>
> > Cc: Borislav Petkov <bp@...en8.de>
> > Cc: Dan Carpenter <dan.carpenter@...aro.org>
> > Cc: Andrew Morton <akpm@...ux-foundation.org>
> > Cc: Linus Torvalds <torvalds@...ux-foundation.org>
> > Cc: Dave Hansen <dave.hansen@...ux.intel.com>
> > Cc: Andy Lutomirski <luto@...nel.org>
> > Cc: Peter Zijlstra <peterz@...radead.org>
> > Cc: Rik van Riel <riel@...riel.com>
> > Cc: "H. Peter Anvin" <hpa@...or.com>
> > Cc: Peter Xu <peterx@...hat.com>
> > Signed-off-by: David Hildenbrand <david@...hat.com>
> > ---
>
> Apparently smatch is not happy about some scenarios. The following might
> make it happy, and make track_pfn_copy() obey the documentation "pfn set if
> rc == 0".
>
> diff --git a/arch/x86/mm/pat/memtype.c b/arch/x86/mm/pat/memtype.c
> index d721cc19addbd..a51d21d2e5198 100644
> --- a/arch/x86/mm/pat/memtype.c
> +++ b/arch/x86/mm/pat/memtype.c
> @@ -992,8 +992,10 @@ int track_pfn_copy(struct vm_area_struct *dst_vma,
> pgprot_t pgprot;
> int rc;
>
> - if (!(src_vma->vm_flags & VM_PAT))
> + if (!(src_vma->vm_flags & VM_PAT)) {
> + *pfn = 0;
> return 0;
> + }
>
> /*
> * Duplicate the PAT information for the dst VMA based on the src
> diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h
> index 4c107e17c547e..d4b564aacab8f 100644
> --- a/include/linux/pgtable.h
> +++ b/include/linux/pgtable.h
> @@ -1515,6 +1515,7 @@ static inline void track_pfn_insert(struct
> vm_area_struct *vma, pgprot_t *prot,
> static inline int track_pfn_copy(struct vm_area_struct *dst_vma,
> struct vm_area_struct *src_vma, unsigned long *pfn)
> {
> + *pfn = 0;
> return 0;
> }
OK interesting, I would have thought it'd be setting the pfn in the local var,
but this is probably actually better + clearer so we consistently set the value
in track_pfn_copy() (in non-error case).
>
>
>
> --
> Cheers,
>
> David / dhildenb
>
Powered by blists - more mailing lists