lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAG48ez0i527ibBCvZ_TF_PVt4OxfVTpS=_TYUKrk0cRQ10Bpxg@mail.gmail.com>
Date: Mon, 2 Jun 2025 17:26:32 +0200
From: Jann Horn <jannh@...gle.com>
To: Anthony Yznaga <anthony.yznaga@...cle.com>
Cc: akpm@...ux-foundation.org, willy@...radead.org, markhemm@...glemail.com, 
	viro@...iv.linux.org.uk, david@...hat.com, khalid@...nel.org, 
	andreyknvl@...il.com, dave.hansen@...el.com, luto@...nel.org, 
	brauner@...nel.org, arnd@...db.de, ebiederm@...ssion.com, 
	catalin.marinas@....com, linux-arch@...r.kernel.org, 
	linux-kernel@...r.kernel.org, linux-mm@...ck.org, mhiramat@...nel.org, 
	rostedt@...dmis.org, vasily.averin@...ux.dev, xhao@...ux.alibaba.com, 
	pcc@...gle.com, neilb@...e.de, maz@...nel.org, 
	Lorenzo Stoakes <lorenzo.stoakes@...cle.com>, Liam Howlett <liam.howlett@...cle.com>
Subject: Re: [PATCH v2 12/20] mm/mshare: prepare for page table sharing support

On Fri, May 30, 2025 at 6:42 PM Anthony Yznaga
<anthony.yznaga@...cle.com> wrote:
> On 5/30/25 7:56 AM, Jann Horn wrote:
> > On Fri, Apr 4, 2025 at 4:18 AM Anthony Yznaga <anthony.yznaga@...cle.com> wrote:
> >> In preparation for enabling the handling of page faults in an mshare
> >> region provide a way to link an mshare shared page table to a process
> >> page table and otherwise find the actual vma in order to handle a page
> >> fault. Modify the unmap path to ensure that page tables in mshare regions
> >> are unlinked and kept intact when a process exits or an mshare region
> >> is explicitly unmapped.
> >>
> >> Signed-off-by: Khalid Aziz <khalid@...nel.org>
> >> Signed-off-by: Matthew Wilcox (Oracle) <willy@...radead.org>
> >> Signed-off-by: Anthony Yznaga <anthony.yznaga@...cle.com>
> > [...]
> >> diff --git a/mm/memory.c b/mm/memory.c
> >> index db558fe43088..68422b606819 100644
> >> --- a/mm/memory.c
> >> +++ b/mm/memory.c
> > [...]
> >> @@ -259,7 +260,10 @@ static inline void free_p4d_range(struct mmu_gather *tlb, pgd_t *pgd,
> >>                  next = p4d_addr_end(addr, end);
> >>                  if (p4d_none_or_clear_bad(p4d))
> >>                          continue;
> >> -               free_pud_range(tlb, p4d, addr, next, floor, ceiling);
> >> +               if (unlikely(shared_pud))
> >> +                       p4d_clear(p4d);
> >> +               else
> >> +                       free_pud_range(tlb, p4d, addr, next, floor, ceiling);
> >>          } while (p4d++, addr = next, addr != end);
> >>
> >>          start &= PGDIR_MASK;
> > [...]
> >> +static void mshare_vm_op_unmap_page_range(struct mmu_gather *tlb,
> >> +                               struct vm_area_struct *vma,
> >> +                               unsigned long addr, unsigned long end,
> >> +                               struct zap_details *details)
> >> +{
> >> +       /*
> >> +        * The msharefs vma is being unmapped. Do not unmap pages in the
> >> +        * mshare region itself.
> >> +        */
> >> +}
> >
> > Unmapping a VMA has three major phases:
> >
> > 1. unlinking the VMA from the VMA tree
> > 2. removing the VMA contents
> > 3. removing unneeded page tables
> >
> > The MM subsystem broadly assumes that after phase 2, no stuff is
> > mapped in the region anymore and therefore changes to the backing file
> > don't need to TLB-flush this VMA anymore, and unlinks the mapping from
> > rmaps and such. If munmap() of an mshare region only removes the
> > mapping of shared page tables in step 3, as implemented here, that
> > means things like TLB flushes won't be able to discover all
> > currently-existing mshare mappings of a host MM through rmap walks.
> >
> > I think it would make more sense to remove the links to shared page
> > tables in step 2 (meaning in mshare_vm_op_unmap_page_range), just like
> > hugetlb does, and not modify free_pgtables().
>
> That makes sense. I'll make this change.

Related: I think there needs to be a strategy for preventing walking
of mshare host page tables through an mshare VMA by codepaths relying
on MM/VMA locks, because those locks won't have an effect on the
underlying host MM. For example, I think the only reason fork() is
safe with your proposal is that copy_page_range() skips shared VMAs,
and I think non-fast get_user_pages() could maybe hit use-after-free
of page tables or such?

I guess the only clean strategy for that is to ensure that all
locking-based page table walking code does a check for "is this an
mshare VMA?" and, if yes, either bails immediately or takes extra
locks on the host MM (which could get messy).

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ