[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <aJmlj3bG6qb60Me0@kernel.org>
Date: Mon, 11 Aug 2025 11:10:55 +0300
From: Mike Rapoport <rppt@...nel.org>
To: Harry Yoo <harry.yoo@...cle.com>
Cc: Dennis Zhou <dennis@...nel.org>,
Andrew Morton <akpm@...ux-foundation.org>,
Andrey Ryabinin <ryabinin.a.a@...il.com>, x86@...nel.org,
Borislav Petkov <bp@...en8.de>,
Peter Zijlstra <peterz@...radead.org>,
Andy Lutomirski <luto@...nel.org>,
Thomas Gleixner <tglx@...utronix.de>,
Ingo Molnar <mingo@...hat.com>, Tejun Heo <tj@...nel.org>,
Uladzislau Rezki <urezki@...il.com>,
Dave Hansen <dave.hansen@...ux.intel.com>,
Christoph Lameter <cl@...two.org>,
David Hildenbrand <david@...hat.com>,
Andrey Konovalov <andreyknvl@...il.com>,
Vincenzo Frascino <vincenzo.frascino@....com>,
"H. Peter Anvin" <hpa@...or.com>, kasan-dev@...glegroups.com,
Ard Biesheuvel <ardb@...nel.org>, linux-kernel@...r.kernel.org,
Dmitry Vyukov <dvyukov@...gle.com>,
Alexander Potapenko <glider@...gle.com>,
Vlastimil Babka <vbabka@...e.cz>,
Suren Baghdasaryan <surenb@...gle.com>,
Thomas Huth <thuth@...hat.com>, John Hubbard <jhubbard@...dia.com>,
Lorenzo Stoakes <lorenzo.stoakes@...cle.com>,
Michal Hocko <mhocko@...e.com>,
"Liam R. Howlett" <Liam.Howlett@...cle.com>, linux-mm@...ck.org,
"Kirill A. Shutemov" <kas@...nel.org>,
Oscar Salvador <osalvador@...e.de>, Jane Chu <jane.chu@...cle.com>,
Gwan-gyeong Mun <gwan-gyeong.mun@...el.com>,
"Aneesh Kumar K . V" <aneesh.kumar@...ux.ibm.com>,
Joerg Roedel <joro@...tes.org>,
Alistair Popple <apopple@...dia.com>,
Joao Martins <joao.m.martins@...cle.com>,
linux-arch@...r.kernel.org, stable@...r.kernel.org
Subject: Re: [PATCH V4 mm-hotfixes 2/3] mm: introduce and use
{pgd,p4d}_populate_kernel()
On Mon, Aug 11, 2025 at 02:34:19PM +0900, Harry Yoo wrote:
> Introduce and use {pgd,p4d}_populate_kernel() in core MM code when
> populating PGD and P4D entries for the kernel address space.
> These helpers ensure proper synchronization of page tables when
> updating the kernel portion of top-level page tables.
>
> Until now, the kernel has relied on each architecture to handle
> synchronization of top-level page tables in an ad-hoc manner.
> For example, see commit 9b861528a801 ("x86-64, mem: Update all PGDs for
> direct mapping and vmemmap mapping changes").
>
> However, this approach has proven fragile for following reasons:
>
> 1) It is easy to forget to perform the necessary page table
> synchronization when introducing new changes.
> For instance, commit 4917f55b4ef9 ("mm/sparse-vmemmap: improve memory
> savings for compound devmaps") overlooked the need to synchronize
> page tables for the vmemmap area.
>
> 2) It is also easy to overlook that the vmemmap and direct mapping areas
> must not be accessed before explicit page table synchronization.
> For example, commit 8d400913c231 ("x86/vmemmap: handle unpopulated
> sub-pmd ranges")) caused crashes by accessing the vmemmap area
> before calling sync_global_pgds().
>
> To address this, as suggested by Dave Hansen, introduce _kernel() variants
> of the page table population helpers, which invoke architecture-specific
> hooks to properly synchronize page tables. These are introduced in a new
> header file, include/linux/pgalloc.h, so they can be called from common code.
>
> They reuse existing infrastructure for vmalloc and ioremap.
> Synchronization requirements are determined by ARCH_PAGE_TABLE_SYNC_MASK,
> and the actual synchronization is performed by arch_sync_kernel_mappings().
>
> This change currently targets only x86_64, so only PGD and P4D level
> helpers are introduced. In theory, PUD and PMD level helpers can be added
> later if needed by other architectures.
>
> Currently this is a no-op, since no architecture sets
> PGTBL_{PGD,P4D}_MODIFIED in ARCH_PAGE_TABLE_SYNC_MASK.
>
> Cc: <stable@...r.kernel.org>
> Fixes: 8d400913c231 ("x86/vmemmap: handle unpopulated sub-pmd ranges")
> Suggested-by: Dave Hansen <dave.hansen@...ux.intel.com>
> Signed-off-by: Harry Yoo <harry.yoo@...cle.com>
Reviewed-by: Mike Rapoport (Microsoft) <rppt@...nel.org>
> ---
> include/linux/pgalloc.h | 24 ++++++++++++++++++++++++
> include/linux/pgtable.h | 4 ++--
> mm/kasan/init.c | 12 ++++++------
> mm/percpu.c | 6 +++---
> mm/sparse-vmemmap.c | 6 +++---
> 5 files changed, 38 insertions(+), 14 deletions(-)
> create mode 100644 include/linux/pgalloc.h
>
> diff --git a/include/linux/pgalloc.h b/include/linux/pgalloc.h
> new file mode 100644
> index 000000000000..290ab864320f
> --- /dev/null
> +++ b/include/linux/pgalloc.h
> @@ -0,0 +1,24 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +#ifndef _LINUX_PGALLOC_H
> +#define _LINUX_PGALLOC_H
> +
> +#include <linux/pgtable.h>
> +#include <asm/pgalloc.h>
> +
> +static inline void pgd_populate_kernel(unsigned long addr, pgd_t *pgd,
> + p4d_t *p4d)
> +{
> + pgd_populate(&init_mm, pgd, p4d);
> + if (ARCH_PAGE_TABLE_SYNC_MASK & PGTBL_PGD_MODIFIED)
> + arch_sync_kernel_mappings(addr, addr);
> +}
> +
> +static inline void p4d_populate_kernel(unsigned long addr, p4d_t *p4d,
> + pud_t *pud)
> +{
> + p4d_populate(&init_mm, p4d, pud);
> + if (ARCH_PAGE_TABLE_SYNC_MASK & PGTBL_P4D_MODIFIED)
> + arch_sync_kernel_mappings(addr, addr);
> +}
> +
> +#endif /* _LINUX_PGALLOC_H */
> diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h
> index ba699df6ef69..0cf5c6c3e483 100644
> --- a/include/linux/pgtable.h
> +++ b/include/linux/pgtable.h
> @@ -1469,8 +1469,8 @@ static inline void modify_prot_commit_ptes(struct vm_area_struct *vma, unsigned
>
> /*
> * Architectures can set this mask to a combination of PGTBL_P?D_MODIFIED values
> - * and let generic vmalloc and ioremap code know when arch_sync_kernel_mappings()
> - * needs to be called.
> + * and let generic vmalloc, ioremap and page table update code know when
> + * arch_sync_kernel_mappings() needs to be called.
> */
> #ifndef ARCH_PAGE_TABLE_SYNC_MASK
> #define ARCH_PAGE_TABLE_SYNC_MASK 0
> diff --git a/mm/kasan/init.c b/mm/kasan/init.c
> index ced6b29fcf76..8fce3370c84e 100644
> --- a/mm/kasan/init.c
> +++ b/mm/kasan/init.c
> @@ -13,9 +13,9 @@
> #include <linux/mm.h>
> #include <linux/pfn.h>
> #include <linux/slab.h>
> +#include <linux/pgalloc.h>
>
> #include <asm/page.h>
> -#include <asm/pgalloc.h>
>
> #include "kasan.h"
>
> @@ -191,7 +191,7 @@ static int __ref zero_p4d_populate(pgd_t *pgd, unsigned long addr,
> pud_t *pud;
> pmd_t *pmd;
>
> - p4d_populate(&init_mm, p4d,
> + p4d_populate_kernel(addr, p4d,
> lm_alias(kasan_early_shadow_pud));
> pud = pud_offset(p4d, addr);
> pud_populate(&init_mm, pud,
> @@ -212,7 +212,7 @@ static int __ref zero_p4d_populate(pgd_t *pgd, unsigned long addr,
> } else {
> p = early_alloc(PAGE_SIZE, NUMA_NO_NODE);
> pud_init(p);
> - p4d_populate(&init_mm, p4d, p);
> + p4d_populate_kernel(addr, p4d, p);
> }
> }
> zero_pud_populate(p4d, addr, next);
> @@ -251,10 +251,10 @@ int __ref kasan_populate_early_shadow(const void *shadow_start,
> * puds,pmds, so pgd_populate(), pud_populate()
> * is noops.
> */
> - pgd_populate(&init_mm, pgd,
> + pgd_populate_kernel(addr, pgd,
> lm_alias(kasan_early_shadow_p4d));
> p4d = p4d_offset(pgd, addr);
> - p4d_populate(&init_mm, p4d,
> + p4d_populate_kernel(addr, p4d,
> lm_alias(kasan_early_shadow_pud));
> pud = pud_offset(p4d, addr);
> pud_populate(&init_mm, pud,
> @@ -273,7 +273,7 @@ int __ref kasan_populate_early_shadow(const void *shadow_start,
> if (!p)
> return -ENOMEM;
> } else {
> - pgd_populate(&init_mm, pgd,
> + pgd_populate_kernel(addr, pgd,
> early_alloc(PAGE_SIZE, NUMA_NO_NODE));
> }
> }
> diff --git a/mm/percpu.c b/mm/percpu.c
> index d9cbaee92b60..a56f35dcc417 100644
> --- a/mm/percpu.c
> +++ b/mm/percpu.c
> @@ -3108,7 +3108,7 @@ int __init pcpu_embed_first_chunk(size_t reserved_size, size_t dyn_size,
> #endif /* BUILD_EMBED_FIRST_CHUNK */
>
> #ifdef BUILD_PAGE_FIRST_CHUNK
> -#include <asm/pgalloc.h>
> +#include <linux/pgalloc.h>
>
> #ifndef P4D_TABLE_SIZE
> #define P4D_TABLE_SIZE PAGE_SIZE
> @@ -3134,13 +3134,13 @@ void __init __weak pcpu_populate_pte(unsigned long addr)
>
> if (pgd_none(*pgd)) {
> p4d = memblock_alloc_or_panic(P4D_TABLE_SIZE, P4D_TABLE_SIZE);
> - pgd_populate(&init_mm, pgd, p4d);
> + pgd_populate_kernel(addr, pgd, p4d);
> }
>
> p4d = p4d_offset(pgd, addr);
> if (p4d_none(*p4d)) {
> pud = memblock_alloc_or_panic(PUD_TABLE_SIZE, PUD_TABLE_SIZE);
> - p4d_populate(&init_mm, p4d, pud);
> + p4d_populate_kernel(addr, p4d, pud);
> }
>
> pud = pud_offset(p4d, addr);
> diff --git a/mm/sparse-vmemmap.c b/mm/sparse-vmemmap.c
> index 41aa0493eb03..dbd8daccade2 100644
> --- a/mm/sparse-vmemmap.c
> +++ b/mm/sparse-vmemmap.c
> @@ -27,9 +27,9 @@
> #include <linux/spinlock.h>
> #include <linux/vmalloc.h>
> #include <linux/sched.h>
> +#include <linux/pgalloc.h>
>
> #include <asm/dma.h>
> -#include <asm/pgalloc.h>
> #include <asm/tlbflush.h>
>
> #include "hugetlb_vmemmap.h"
> @@ -229,7 +229,7 @@ p4d_t * __meminit vmemmap_p4d_populate(pgd_t *pgd, unsigned long addr, int node)
> if (!p)
> return NULL;
> pud_init(p);
> - p4d_populate(&init_mm, p4d, p);
> + p4d_populate_kernel(addr, p4d, p);
> }
> return p4d;
> }
> @@ -241,7 +241,7 @@ pgd_t * __meminit vmemmap_pgd_populate(unsigned long addr, int node)
> void *p = vmemmap_alloc_block_zero(PAGE_SIZE, node);
> if (!p)
> return NULL;
> - pgd_populate(&init_mm, pgd, p);
> + pgd_populate_kernel(addr, pgd, p);
> }
> return pgd;
> }
> --
> 2.43.0
>
--
Sincerely yours,
Mike.
Powered by blists - more mailing lists