[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAG48ez0i527ibBCvZ_TF_PVt4OxfVTpS=_TYUKrk0cRQ10Bpxg@mail.gmail.com>
Date: Mon, 2 Jun 2025 17:26:32 +0200
From: Jann Horn <jannh@...gle.com>
To: Anthony Yznaga <anthony.yznaga@...cle.com>
Cc: akpm@...ux-foundation.org, willy@...radead.org, markhemm@...glemail.com,
viro@...iv.linux.org.uk, david@...hat.com, khalid@...nel.org,
andreyknvl@...il.com, dave.hansen@...el.com, luto@...nel.org,
brauner@...nel.org, arnd@...db.de, ebiederm@...ssion.com,
catalin.marinas@....com, linux-arch@...r.kernel.org,
linux-kernel@...r.kernel.org, linux-mm@...ck.org, mhiramat@...nel.org,
rostedt@...dmis.org, vasily.averin@...ux.dev, xhao@...ux.alibaba.com,
pcc@...gle.com, neilb@...e.de, maz@...nel.org,
Lorenzo Stoakes <lorenzo.stoakes@...cle.com>, Liam Howlett <liam.howlett@...cle.com>
Subject: Re: [PATCH v2 12/20] mm/mshare: prepare for page table sharing support
On Fri, May 30, 2025 at 6:42 PM Anthony Yznaga
<anthony.yznaga@...cle.com> wrote:
> On 5/30/25 7:56 AM, Jann Horn wrote:
> > On Fri, Apr 4, 2025 at 4:18 AM Anthony Yznaga <anthony.yznaga@...cle.com> wrote:
> >> In preparation for enabling the handling of page faults in an mshare
> >> region provide a way to link an mshare shared page table to a process
> >> page table and otherwise find the actual vma in order to handle a page
> >> fault. Modify the unmap path to ensure that page tables in mshare regions
> >> are unlinked and kept intact when a process exits or an mshare region
> >> is explicitly unmapped.
> >>
> >> Signed-off-by: Khalid Aziz <khalid@...nel.org>
> >> Signed-off-by: Matthew Wilcox (Oracle) <willy@...radead.org>
> >> Signed-off-by: Anthony Yznaga <anthony.yznaga@...cle.com>
> > [...]
> >> diff --git a/mm/memory.c b/mm/memory.c
> >> index db558fe43088..68422b606819 100644
> >> --- a/mm/memory.c
> >> +++ b/mm/memory.c
> > [...]
> >> @@ -259,7 +260,10 @@ static inline void free_p4d_range(struct mmu_gather *tlb, pgd_t *pgd,
> >> next = p4d_addr_end(addr, end);
> >> if (p4d_none_or_clear_bad(p4d))
> >> continue;
> >> - free_pud_range(tlb, p4d, addr, next, floor, ceiling);
> >> + if (unlikely(shared_pud))
> >> + p4d_clear(p4d);
> >> + else
> >> + free_pud_range(tlb, p4d, addr, next, floor, ceiling);
> >> } while (p4d++, addr = next, addr != end);
> >>
> >> start &= PGDIR_MASK;
> > [...]
> >> +static void mshare_vm_op_unmap_page_range(struct mmu_gather *tlb,
> >> + struct vm_area_struct *vma,
> >> + unsigned long addr, unsigned long end,
> >> + struct zap_details *details)
> >> +{
> >> + /*
> >> + * The msharefs vma is being unmapped. Do not unmap pages in the
> >> + * mshare region itself.
> >> + */
> >> +}
> >
> > Unmapping a VMA has three major phases:
> >
> > 1. unlinking the VMA from the VMA tree
> > 2. removing the VMA contents
> > 3. removing unneeded page tables
> >
> > The MM subsystem broadly assumes that after phase 2, no stuff is
> > mapped in the region anymore and therefore changes to the backing file
> > don't need to TLB-flush this VMA anymore, and unlinks the mapping from
> > rmaps and such. If munmap() of an mshare region only removes the
> > mapping of shared page tables in step 3, as implemented here, that
> > means things like TLB flushes won't be able to discover all
> > currently-existing mshare mappings of a host MM through rmap walks.
> >
> > I think it would make more sense to remove the links to shared page
> > tables in step 2 (meaning in mshare_vm_op_unmap_page_range), just like
> > hugetlb does, and not modify free_pgtables().
>
> That makes sense. I'll make this change.
Related: I think there needs to be a strategy for preventing walking
of mshare host page tables through an mshare VMA by codepaths relying
on MM/VMA locks, because those locks won't have an effect on the
underlying host MM. For example, I think the only reason fork() is
safe with your proposal is that copy_page_range() skips shared VMAs,
and I think non-fast get_user_pages() could maybe hit use-after-free
of page tables or such?
I guess the only clean strategy for that is to ensure that all
locking-based page table walking code does a check for "is this an
mshare VMA?" and, if yes, either bails immediately or takes extra
locks on the host MM (which could get messy).
Powered by blists - more mailing lists