linux-kernel - Re: [PATCH V4 mm-hotfixes 2/3] mm: introduce and use {pgd,p4d}_populate

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <aJmlj3bG6qb60Me0@kernel.org>
Date: Mon, 11 Aug 2025 11:10:55 +0300
From: Mike Rapoport <rppt@...nel.org>
To: Harry Yoo <harry.yoo@...cle.com>
Cc: Dennis Zhou <dennis@...nel.org>,
	Andrew Morton <akpm@...ux-foundation.org>,
	Andrey Ryabinin <ryabinin.a.a@...il.com>, x86@...nel.org,
	Borislav Petkov <bp@...en8.de>,
	Peter Zijlstra <peterz@...radead.org>,
	Andy Lutomirski <luto@...nel.org>,
	Thomas Gleixner <tglx@...utronix.de>,
	Ingo Molnar <mingo@...hat.com>, Tejun Heo <tj@...nel.org>,
	Uladzislau Rezki <urezki@...il.com>,
	Dave Hansen <dave.hansen@...ux.intel.com>,
	Christoph Lameter <cl@...two.org>,
	David Hildenbrand <david@...hat.com>,
	Andrey Konovalov <andreyknvl@...il.com>,
	Vincenzo Frascino <vincenzo.frascino@....com>,
	"H. Peter Anvin" <hpa@...or.com>, kasan-dev@...glegroups.com,
	Ard Biesheuvel <ardb@...nel.org>, linux-kernel@...r.kernel.org,
	Dmitry Vyukov <dvyukov@...gle.com>,
	Alexander Potapenko <glider@...gle.com>,
	Vlastimil Babka <vbabka@...e.cz>,
	Suren Baghdasaryan <surenb@...gle.com>,
	Thomas Huth <thuth@...hat.com>, John Hubbard <jhubbard@...dia.com>,
	Lorenzo Stoakes <lorenzo.stoakes@...cle.com>,
	Michal Hocko <mhocko@...e.com>,
	"Liam R. Howlett" <Liam.Howlett@...cle.com>, linux-mm@...ck.org,
	"Kirill A. Shutemov" <kas@...nel.org>,
	Oscar Salvador <osalvador@...e.de>, Jane Chu <jane.chu@...cle.com>,
	Gwan-gyeong Mun <gwan-gyeong.mun@...el.com>,
	"Aneesh Kumar K . V" <aneesh.kumar@...ux.ibm.com>,
	Joerg Roedel <joro@...tes.org>,
	Alistair Popple <apopple@...dia.com>,
	Joao Martins <joao.m.martins@...cle.com>,
	linux-arch@...r.kernel.org, stable@...r.kernel.org
Subject: Re: [PATCH V4 mm-hotfixes 2/3] mm: introduce and use
 {pgd,p4d}_populate_kernel()

On Mon, Aug 11, 2025 at 02:34:19PM +0900, Harry Yoo wrote:
> Introduce and use {pgd,p4d}_populate_kernel() in core MM code when
> populating PGD and P4D entries for the kernel address space.
> These helpers ensure proper synchronization of page tables when
> updating the kernel portion of top-level page tables.
> 
> Until now, the kernel has relied on each architecture to handle
> synchronization of top-level page tables in an ad-hoc manner.
> For example, see commit 9b861528a801 ("x86-64, mem: Update all PGDs for
> direct mapping and vmemmap mapping changes").
> 
> However, this approach has proven fragile for following reasons:
> 
>   1) It is easy to forget to perform the necessary page table
>      synchronization when introducing new changes.
>      For instance, commit 4917f55b4ef9 ("mm/sparse-vmemmap: improve memory
>      savings for compound devmaps") overlooked the need to synchronize
>      page tables for the vmemmap area.
> 
>   2) It is also easy to overlook that the vmemmap and direct mapping areas
>      must not be accessed before explicit page table synchronization.
>      For example, commit 8d400913c231 ("x86/vmemmap: handle unpopulated
>      sub-pmd ranges")) caused crashes by accessing the vmemmap area
>      before calling sync_global_pgds().
> 
> To address this, as suggested by Dave Hansen, introduce _kernel() variants
> of the page table population helpers, which invoke architecture-specific
> hooks to properly synchronize page tables. These are introduced in a new
> header file, include/linux/pgalloc.h, so they can be called from common code.
> 
> They reuse existing infrastructure for vmalloc and ioremap.
> Synchronization requirements are determined by ARCH_PAGE_TABLE_SYNC_MASK,
> and the actual synchronization is performed by arch_sync_kernel_mappings().
> 
> This change currently targets only x86_64, so only PGD and P4D level
> helpers are introduced. In theory, PUD and PMD level helpers can be added
> later if needed by other architectures.
> 
> Currently this is a no-op, since no architecture sets
> PGTBL_{PGD,P4D}_MODIFIED in ARCH_PAGE_TABLE_SYNC_MASK.
> 
> Cc: <stable@...r.kernel.org>
> Fixes: 8d400913c231 ("x86/vmemmap: handle unpopulated sub-pmd ranges")
> Suggested-by: Dave Hansen <dave.hansen@...ux.intel.com>
> Signed-off-by: Harry Yoo <harry.yoo@...cle.com>

Reviewed-by: Mike Rapoport (Microsoft) <rppt@...nel.org>

> ---
>  include/linux/pgalloc.h | 24 ++++++++++++++++++++++++
>  include/linux/pgtable.h |  4 ++--
>  mm/kasan/init.c         | 12 ++++++------
>  mm/percpu.c             |  6 +++---
>  mm/sparse-vmemmap.c     |  6 +++---
>  5 files changed, 38 insertions(+), 14 deletions(-)
>  create mode 100644 include/linux/pgalloc.h
> 
> diff --git a/include/linux/pgalloc.h b/include/linux/pgalloc.h
> new file mode 100644
> index 000000000000..290ab864320f
> --- /dev/null
> +++ b/include/linux/pgalloc.h
> @@ -0,0 +1,24 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +#ifndef _LINUX_PGALLOC_H
> +#define _LINUX_PGALLOC_H
> +
> +#include <linux/pgtable.h>
> +#include <asm/pgalloc.h>
> +
> +static inline void pgd_populate_kernel(unsigned long addr, pgd_t *pgd,
> +				       p4d_t *p4d)
> +{
> +	pgd_populate(&init_mm, pgd, p4d);
> +	if (ARCH_PAGE_TABLE_SYNC_MASK & PGTBL_PGD_MODIFIED)
> +		arch_sync_kernel_mappings(addr, addr);
> +}
> +
> +static inline void p4d_populate_kernel(unsigned long addr, p4d_t *p4d,
> +				       pud_t *pud)
> +{
> +	p4d_populate(&init_mm, p4d, pud);
> +	if (ARCH_PAGE_TABLE_SYNC_MASK & PGTBL_P4D_MODIFIED)
> +		arch_sync_kernel_mappings(addr, addr);
> +}
> +
> +#endif /* _LINUX_PGALLOC_H */
> diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h
> index ba699df6ef69..0cf5c6c3e483 100644
> --- a/include/linux/pgtable.h
> +++ b/include/linux/pgtable.h
> @@ -1469,8 +1469,8 @@ static inline void modify_prot_commit_ptes(struct vm_area_struct *vma, unsigned
>  
>  /*
>   * Architectures can set this mask to a combination of PGTBL_P?D_MODIFIED values
> - * and let generic vmalloc and ioremap code know when arch_sync_kernel_mappings()
> - * needs to be called.
> + * and let generic vmalloc, ioremap and page table update code know when
> + * arch_sync_kernel_mappings() needs to be called.
>   */
>  #ifndef ARCH_PAGE_TABLE_SYNC_MASK
>  #define ARCH_PAGE_TABLE_SYNC_MASK 0
> diff --git a/mm/kasan/init.c b/mm/kasan/init.c
> index ced6b29fcf76..8fce3370c84e 100644
> --- a/mm/kasan/init.c
> +++ b/mm/kasan/init.c
> @@ -13,9 +13,9 @@
>  #include <linux/mm.h>
>  #include <linux/pfn.h>
>  #include <linux/slab.h>
> +#include <linux/pgalloc.h>
>  
>  #include <asm/page.h>
> -#include <asm/pgalloc.h>
>  
>  #include "kasan.h"
>  
> @@ -191,7 +191,7 @@ static int __ref zero_p4d_populate(pgd_t *pgd, unsigned long addr,
>  			pud_t *pud;
>  			pmd_t *pmd;
>  
> -			p4d_populate(&init_mm, p4d,
> +			p4d_populate_kernel(addr, p4d,
>  					lm_alias(kasan_early_shadow_pud));
>  			pud = pud_offset(p4d, addr);
>  			pud_populate(&init_mm, pud,
> @@ -212,7 +212,7 @@ static int __ref zero_p4d_populate(pgd_t *pgd, unsigned long addr,
>  			} else {
>  				p = early_alloc(PAGE_SIZE, NUMA_NO_NODE);
>  				pud_init(p);
> -				p4d_populate(&init_mm, p4d, p);
> +				p4d_populate_kernel(addr, p4d, p);
>  			}
>  		}
>  		zero_pud_populate(p4d, addr, next);
> @@ -251,10 +251,10 @@ int __ref kasan_populate_early_shadow(const void *shadow_start,
>  			 * puds,pmds, so pgd_populate(), pud_populate()
>  			 * is noops.
>  			 */
> -			pgd_populate(&init_mm, pgd,
> +			pgd_populate_kernel(addr, pgd,
>  					lm_alias(kasan_early_shadow_p4d));
>  			p4d = p4d_offset(pgd, addr);
> -			p4d_populate(&init_mm, p4d,
> +			p4d_populate_kernel(addr, p4d,
>  					lm_alias(kasan_early_shadow_pud));
>  			pud = pud_offset(p4d, addr);
>  			pud_populate(&init_mm, pud,
> @@ -273,7 +273,7 @@ int __ref kasan_populate_early_shadow(const void *shadow_start,
>  				if (!p)
>  					return -ENOMEM;
>  			} else {
> -				pgd_populate(&init_mm, pgd,
> +				pgd_populate_kernel(addr, pgd,
>  					early_alloc(PAGE_SIZE, NUMA_NO_NODE));
>  			}
>  		}
> diff --git a/mm/percpu.c b/mm/percpu.c
> index d9cbaee92b60..a56f35dcc417 100644
> --- a/mm/percpu.c
> +++ b/mm/percpu.c
> @@ -3108,7 +3108,7 @@ int __init pcpu_embed_first_chunk(size_t reserved_size, size_t dyn_size,
>  #endif /* BUILD_EMBED_FIRST_CHUNK */
>  
>  #ifdef BUILD_PAGE_FIRST_CHUNK
> -#include <asm/pgalloc.h>
> +#include <linux/pgalloc.h>
>  
>  #ifndef P4D_TABLE_SIZE
>  #define P4D_TABLE_SIZE PAGE_SIZE
> @@ -3134,13 +3134,13 @@ void __init __weak pcpu_populate_pte(unsigned long addr)
>  
>  	if (pgd_none(*pgd)) {
>  		p4d = memblock_alloc_or_panic(P4D_TABLE_SIZE, P4D_TABLE_SIZE);
> -		pgd_populate(&init_mm, pgd, p4d);
> +		pgd_populate_kernel(addr, pgd, p4d);
>  	}
>  
>  	p4d = p4d_offset(pgd, addr);
>  	if (p4d_none(*p4d)) {
>  		pud = memblock_alloc_or_panic(PUD_TABLE_SIZE, PUD_TABLE_SIZE);
> -		p4d_populate(&init_mm, p4d, pud);
> +		p4d_populate_kernel(addr, p4d, pud);
>  	}
>  
>  	pud = pud_offset(p4d, addr);
> diff --git a/mm/sparse-vmemmap.c b/mm/sparse-vmemmap.c
> index 41aa0493eb03..dbd8daccade2 100644
> --- a/mm/sparse-vmemmap.c
> +++ b/mm/sparse-vmemmap.c
> @@ -27,9 +27,9 @@
>  #include <linux/spinlock.h>
>  #include <linux/vmalloc.h>
>  #include <linux/sched.h>
> +#include <linux/pgalloc.h>
>  
>  #include <asm/dma.h>
> -#include <asm/pgalloc.h>
>  #include <asm/tlbflush.h>
>  
>  #include "hugetlb_vmemmap.h"
> @@ -229,7 +229,7 @@ p4d_t * __meminit vmemmap_p4d_populate(pgd_t *pgd, unsigned long addr, int node)
>  		if (!p)
>  			return NULL;
>  		pud_init(p);
> -		p4d_populate(&init_mm, p4d, p);
> +		p4d_populate_kernel(addr, p4d, p);
>  	}
>  	return p4d;
>  }
> @@ -241,7 +241,7 @@ pgd_t * __meminit vmemmap_pgd_populate(unsigned long addr, int node)
>  		void *p = vmemmap_alloc_block_zero(PAGE_SIZE, node);
>  		if (!p)
>  			return NULL;
> -		pgd_populate(&init_mm, pgd, p);
> +		pgd_populate_kernel(addr, pgd, p);
>  	}
>  	return pgd;
>  }
> -- 
> 2.43.0
> 

-- 
Sincerely yours,
Mike.