linux-kernel - Re: [PATCH v2 12/20] mm/mshare: prepare for page table sharing support

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAG48ez0gzjbumcECmE39dxsF9TTtR0ED3L7WdKdW4RupZDRyWA@mail.gmail.com>
Date: Fri, 30 May 2025 16:56:37 +0200
From: Jann Horn <jannh@...gle.com>
To: Anthony Yznaga <anthony.yznaga@...cle.com>
Cc: akpm@...ux-foundation.org, willy@...radead.org, markhemm@...glemail.com, 
	viro@...iv.linux.org.uk, david@...hat.com, khalid@...nel.org, 
	andreyknvl@...il.com, dave.hansen@...el.com, luto@...nel.org, 
	brauner@...nel.org, arnd@...db.de, ebiederm@...ssion.com, 
	catalin.marinas@....com, linux-arch@...r.kernel.org, 
	linux-kernel@...r.kernel.org, linux-mm@...ck.org, mhiramat@...nel.org, 
	rostedt@...dmis.org, vasily.averin@...ux.dev, xhao@...ux.alibaba.com, 
	pcc@...gle.com, neilb@...e.de, maz@...nel.org, 
	Lorenzo Stoakes <lorenzo.stoakes@...cle.com>, Liam Howlett <liam.howlett@...cle.com>
Subject: Re: [PATCH v2 12/20] mm/mshare: prepare for page table sharing support

On Fri, Apr 4, 2025 at 4:18 AM Anthony Yznaga <anthony.yznaga@...cle.com> wrote:
> In preparation for enabling the handling of page faults in an mshare
> region provide a way to link an mshare shared page table to a process
> page table and otherwise find the actual vma in order to handle a page
> fault. Modify the unmap path to ensure that page tables in mshare regions
> are unlinked and kept intact when a process exits or an mshare region
> is explicitly unmapped.
>
> Signed-off-by: Khalid Aziz <khalid@...nel.org>
> Signed-off-by: Matthew Wilcox (Oracle) <willy@...radead.org>
> Signed-off-by: Anthony Yznaga <anthony.yznaga@...cle.com>
[...]
> diff --git a/mm/memory.c b/mm/memory.c
> index db558fe43088..68422b606819 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
[...]
> @@ -259,7 +260,10 @@ static inline void free_p4d_range(struct mmu_gather *tlb, pgd_t *pgd,
>                 next = p4d_addr_end(addr, end);
>                 if (p4d_none_or_clear_bad(p4d))
>                         continue;
> -               free_pud_range(tlb, p4d, addr, next, floor, ceiling);
> +               if (unlikely(shared_pud))
> +                       p4d_clear(p4d);
> +               else
> +                       free_pud_range(tlb, p4d, addr, next, floor, ceiling);
>         } while (p4d++, addr = next, addr != end);
>
>         start &= PGDIR_MASK;
[...]
> +static void mshare_vm_op_unmap_page_range(struct mmu_gather *tlb,
> +                               struct vm_area_struct *vma,
> +                               unsigned long addr, unsigned long end,
> +                               struct zap_details *details)
> +{
> +       /*
> +        * The msharefs vma is being unmapped. Do not unmap pages in the
> +        * mshare region itself.
> +        */
> +}

Unmapping a VMA has three major phases:

1. unlinking the VMA from the VMA tree
2. removing the VMA contents
3. removing unneeded page tables

The MM subsystem broadly assumes that after phase 2, no stuff is
mapped in the region anymore and therefore changes to the backing file
don't need to TLB-flush this VMA anymore, and unlinks the mapping from
rmaps and such. If munmap() of an mshare region only removes the
mapping of shared page tables in step 3, as implemented here, that
means things like TLB flushes won't be able to discover all
currently-existing mshare mappings of a host MM through rmap walks.

I think it would make more sense to remove the links to shared page
tables in step 2 (meaning in mshare_vm_op_unmap_page_range), just like
hugetlb does, and not modify free_pgtables().

>  static const struct vm_operations_struct msharefs_vm_ops = {
>         .may_split = mshare_vm_op_split,
>         .mprotect = mshare_vm_op_mprotect,
> +       .unmap_page_range = mshare_vm_op_unmap_page_range,
>  };
>
>  /*
> --
> 2.43.5
>