[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ZxGOzE50OpzI3GaY@x1n>
Date: Thu, 17 Oct 2024 18:25:16 -0400
From: Peter Xu <peterx@...hat.com>
To: David Hildenbrand <david@...hat.com>
Cc: lee bruce <xrivendell7@...il.com>, dave.hansen@...ux.intel.com,
linux-kernel@...r.kernel.org, luto@...nel.org, peterz@...radead.org,
bp@...en8.de, hpa@...or.com, mingo@...hat.com,
Thomas Gleixner <tglx@...utronix.de>, x86@...nel.org,
wang1315768607@....com, syzkaller@...glegroups.com
Subject: Re: WARNING in get_pat_info
On Thu, Oct 17, 2024 at 09:27:35PM +0200, David Hildenbrand wrote:
> On 16.08.24 11:44, lee bruce wrote:
> > Hello, I found a bug titled "WARNING in get_pat_info" with modified
> > syzkaller in the lasted upstream and lasted mm branches.
>
> Below report is from 6.10.0, which is not precisely "latest upstream", but I
> assume you have similar reports on upstream?
>
> commit 04c35ab3bdae7fefbd7c7a7355f29fa03a035221
> Author: David Hildenbrand <david@...hat.com>
> Date: Wed Apr 3 23:21:30 2024 +0200
>
> x86/mm/pat: fix VM_PAT handling in COW mappings
>
> Was part of v6.9, but this is a different issue.
>
> >
> > If you fix this issue, please add the following tag to the commit:
> > Reported-by: xingwei lee <xrivendell7@...il.com>
> > Reported-by: yuxin wang <wang1315768607@....com>
> >
> > TITLE: WARNING in get_pat_info
> > ------------[ cut here ]------------
> > WARNING: CPU: 2 PID: 12458 at arch/x86/mm/pat/memtype.c:1002
> > get_pat_info+0x4b6/0x5c0 arch/x86/mm/pat/memtype.c:1002
>
> This is the WARN_ON_ONCE(1) in get_pat_info(). We don't find any page in the
> mapping, so it vanished already.
>
> I thought we discovered that already recently and discussed it here:
>
> https://lore.kernel.org/all/20240712144244.3090089-1-peterx@redhat.com/T/#u
>
> Which was supposed to fix this problem IIRC.
>
> That patch result in other issues, and my analysis about them is here:
>
> https://lore.kernel.org/all/8da2b3bf-b9bf-44e3-88ff-750dc91c2388@redhat.com/
>
> We didn't have a report from an in-tree driver, so we decided to "not care"
> about these reports:
>
> https://lore.kernel.org/all/116ca902-103d-47cb-baf0-905983baf9bb@redhat.com/
>
>
> But I don't see Peter's patch upstream.
>
> Peter, do you recall what the conclusion on that was?
I don't.. and yes I think that patch isn't merged and should still be valid
on its own.
Said that, this seems to be a different issue, even if still relevant for
PAT. The important part of trace dump is:
...
__mmput+0x122/0x4b0 kernel/fork.c:1126
mmput+0x58/0x60 kernel/fork.c:1147
dup_mm kernel/fork.c:1481 [inline]
copy_mm kernel/fork.c:1517 [inline]
...
So I think Dave's analysis is spot on here, that we're trying to fork but
failed:
https://lore.kernel.org/all/f02a96a2-9f3a-4bed-90a5-b3309eb91d94@intel.com/
The PFNMAP vma is going to be destroyed even if, I believe, nothing is
mapped. I said that because we do pte copy in sequence, and the only way
get_pat_info() can fail in this case is the lookup of the 1st pfnmap
failed.
Instead of questioning why the dup_mm() failed, if we can find a similar
way that pfnmap vma can remember the start PFN somehow (just like for COW
memories there.. in vm_pgoff, but again that currently might be used by
drivers), then it'll also work, so that get_pat_info() shouldn't need to
rely on pgtables for VM_SHARED.
That issue was on my todo for a long time, but there's always things
preempt me from that, not sure whether anyone would like to tackle that
even earlier.. or I can find another time to try, by boosting my todo's
priority. In short, IMO we should allow dup_mm() to fail in this case no
matter why that happened, while maintaining PFN mapping info in pgtable is
always tricky to me (e.g., it's against VM_SHARED PFNMAP to support fault()
and page zappings when needed).
Thanks,
--
Peter Xu
Powered by blists - more mailing lists