[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <aJneGJSJcltEIT41@hyeyoo>
Date: Mon, 11 Aug 2025 21:12:08 +0900
From: Harry Yoo <harry.yoo@...cle.com>
To: Lorenzo Stoakes <lorenzo.stoakes@...cle.com>
Cc: Dennis Zhou <dennis@...nel.org>, Andrew Morton <akpm@...ux-foundation.org>,
Andrey Ryabinin <ryabinin.a.a@...il.com>, x86@...nel.org,
Borislav Petkov <bp@...en8.de>, Peter Zijlstra <peterz@...radead.org>,
Andy Lutomirski <luto@...nel.org>,
Thomas Gleixner <tglx@...utronix.de>, Ingo Molnar <mingo@...hat.com>,
Tejun Heo <tj@...nel.org>, Uladzislau Rezki <urezki@...il.com>,
Dave Hansen <dave.hansen@...ux.intel.com>,
Christoph Lameter <cl@...two.org>,
David Hildenbrand <david@...hat.com>,
Andrey Konovalov <andreyknvl@...il.com>,
Vincenzo Frascino <vincenzo.frascino@....com>,
"H. Peter Anvin" <hpa@...or.com>, kasan-dev@...glegroups.com,
Mike Rapoport <rppt@...nel.org>, Ard Biesheuvel <ardb@...nel.org>,
linux-kernel@...r.kernel.org, Dmitry Vyukov <dvyukov@...gle.com>,
Alexander Potapenko <glider@...gle.com>,
Vlastimil Babka <vbabka@...e.cz>,
Suren Baghdasaryan <surenb@...gle.com>, Thomas Huth <thuth@...hat.com>,
John Hubbard <jhubbard@...dia.com>, Michal Hocko <mhocko@...e.com>,
"Liam R. Howlett" <Liam.Howlett@...cle.com>, linux-mm@...ck.org,
"Kirill A. Shutemov" <kas@...nel.org>,
Oscar Salvador <osalvador@...e.de>, Jane Chu <jane.chu@...cle.com>,
Gwan-gyeong Mun <gwan-gyeong.mun@...el.com>,
"Aneesh Kumar K . V" <aneesh.kumar@...ux.ibm.com>,
Joerg Roedel <joro@...tes.org>, Alistair Popple <apopple@...dia.com>,
Joao Martins <joao.m.martins@...cle.com>, linux-arch@...r.kernel.org,
stable@...r.kernel.org
Subject: Re: [PATCH V4 mm-hotfixes 2/3] mm: introduce and use
{pgd,p4d}_populate_kernel()
On Mon, Aug 11, 2025 at 12:38:37PM +0100, Lorenzo Stoakes wrote:
> On Mon, Aug 11, 2025 at 02:34:19PM +0900, Harry Yoo wrote:
> > Introduce and use {pgd,p4d}_populate_kernel() in core MM code when
> > populating PGD and P4D entries for the kernel address space.
> > These helpers ensure proper synchronization of page tables when
> > updating the kernel portion of top-level page tables.
> >
> > Until now, the kernel has relied on each architecture to handle
> > synchronization of top-level page tables in an ad-hoc manner.
> > For example, see commit 9b861528a801 ("x86-64, mem: Update all PGDs for
> > direct mapping and vmemmap mapping changes").
> >
> > However, this approach has proven fragile for following reasons:
> >
> > 1) It is easy to forget to perform the necessary page table
> > synchronization when introducing new changes.
> > For instance, commit 4917f55b4ef9 ("mm/sparse-vmemmap: improve memory
> > savings for compound devmaps") overlooked the need to synchronize
> > page tables for the vmemmap area.
> >
> > 2) It is also easy to overlook that the vmemmap and direct mapping areas
> > must not be accessed before explicit page table synchronization.
> > For example, commit 8d400913c231 ("x86/vmemmap: handle unpopulated
> > sub-pmd ranges")) caused crashes by accessing the vmemmap area
> > before calling sync_global_pgds().
> >
> > To address this, as suggested by Dave Hansen, introduce _kernel() variants
> > of the page table population helpers, which invoke architecture-specific
> > hooks to properly synchronize page tables. These are introduced in a new
> > header file, include/linux/pgalloc.h, so they can be called from common code.
> >
> > They reuse existing infrastructure for vmalloc and ioremap.
> > Synchronization requirements are determined by ARCH_PAGE_TABLE_SYNC_MASK,
> > and the actual synchronization is performed by arch_sync_kernel_mappings().
> >
> > This change currently targets only x86_64, so only PGD and P4D level
Hi Lorenzo, thanks for looking at this!
> Well, arm defines ARCH_PAGE_TABLE_SYNC_MASK in arch/arm/include/asm/page.h. But
> it aliases this to PGTBL_PMD_MODIFIED so will remain unaffected :)
Oh, here I just intended to explain why I didn't implement
{pud,pmd}_populate_kernel().
> > helpers are introduced. In theory, PUD and PMD level helpers can be added
> > later if needed by other architectures.
> >
> > Currently this is a no-op, since no architecture sets
> > PGTBL_{PGD,P4D}_MODIFIED in ARCH_PAGE_TABLE_SYNC_MASK.
> >
> > Cc: <stable@...r.kernel.org>
> > Fixes: 8d400913c231 ("x86/vmemmap: handle unpopulated sub-pmd ranges")
> > Suggested-by: Dave Hansen <dave.hansen@...ux.intel.com>
> > Signed-off-by: Harry Yoo <harry.yoo@...cle.com>
> > ---
> > include/linux/pgalloc.h | 24 ++++++++++++++++++++++++
> > include/linux/pgtable.h | 4 ++--
> > mm/kasan/init.c | 12 ++++++------
> > mm/percpu.c | 6 +++---
> > mm/sparse-vmemmap.c | 6 +++---
> > 5 files changed, 38 insertions(+), 14 deletions(-)
> > create mode 100644 include/linux/pgalloc.h
> >
> > diff --git a/include/linux/pgalloc.h b/include/linux/pgalloc.h
> > new file mode 100644
> > index 000000000000..290ab864320f
> > --- /dev/null
> > +++ b/include/linux/pgalloc.h
> > @@ -0,0 +1,24 @@
> > +/* SPDX-License-Identifier: GPL-2.0 */
> > +#ifndef _LINUX_PGALLOC_H
> > +#define _LINUX_PGALLOC_H
> > +
> > +#include <linux/pgtable.h>
> > +#include <asm/pgalloc.h>
> > +
> > +static inline void pgd_populate_kernel(unsigned long addr, pgd_t *pgd,
> > + p4d_t *p4d)
> > +{
> > + pgd_populate(&init_mm, pgd, p4d);
> > + if (ARCH_PAGE_TABLE_SYNC_MASK & PGTBL_PGD_MODIFIED)
>
> Hm, ARCH_PAGE_TABLE_SYNC_MASK is only defined for x86 2, 3 page level and arm. I see:
>
> #ifndef ARCH_PAGE_TABLE_SYNC_MASK
> #define ARCH_PAGE_TABLE_SYNC_MASK 0
> #endif
>
> In linux/vmalloc.h, but you're not importing that?
Patch 1 moves it from linux/vmalloc.h to linux/pgtable.h,
and linux/pgalloc.h includes linux/pgtable.h.
> It sucks that that there is there, but maybe you need to #include
> <linux/vmalloc.h> for this otherwise this could be broken on other arches?
>
> You may be getting lucky with nested header includes that causes this to be
> picked up somewhere for you, or having it only declared for arches that define
> it, but we should probably make this explicit.
...so I don't think I'm missing necessary header includes even on
other architectures?
> Also arch_sync_kernel_mappings() is defined in linux/vmalloc.h so seems
> sensible.
Also moved to linux/pgtable.h.
> > + arch_sync_kernel_mappings(addr, addr);
> > +}
> > +
> > +static inline void p4d_populate_kernel(unsigned long addr, p4d_t *p4d,
> > + pud_t *pud)
> > +{
> > + p4d_populate(&init_mm, p4d, pud);
> > + if (ARCH_PAGE_TABLE_SYNC_MASK & PGTBL_P4D_MODIFIED)
> > + arch_sync_kernel_mappings(addr, addr);
>
> It's kind of weird we don't have this defined as a function for many arches,
That's really a mystery :)
I have no idea why other architectures don't handle this.
(At least on 64 bit arches) In theory I think only a few architectures
(like arm64 where a kernel page table is shared between tasks) don't have
to implement this.
Probably because it's a bit niche bug to hit?
(vmemmap, direct mapping, vmalloc/vmap area can span multiple PGD ranges)
AND (populating some PGD entries is done after boot process (e.g. memory
hot-plug or vmalloc())).
> (weird as well that we declare it in... vmalloc.h but I guess one for follow up
> cleanups that).
>
> But I see from the comment:
>
> /*
> * There is no default implementation for arch_sync_kernel_mappings(). It is
> * relied upon the compiler to optimize calls out if ARCH_PAGE_TABLE_SYNC_MASK
> * is 0.
> */
>
> So this seems intended... :)
> The rest of this seems sensible, nice cleanup!
Thanks for looking at!
--
Cheers,
Harry / Hyeonggon
Powered by blists - more mailing lists