lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <aKyJCsXC5dL3Olpq@hyeyoo>
Date: Tue, 26 Aug 2025 01:02:18 +0900
From: Harry Yoo <harry.yoo@...cle.com>
To: Christophe Leroy <christophe.leroy@...roup.eu>
Cc: Dennis Zhou <dennis@...nel.org>, Andrew Morton <akpm@...ux-foundation.org>,
        Andrey Ryabinin <ryabinin.a.a@...il.com>, x86@...nel.org,
        Borislav Petkov <bp@...en8.de>, Peter Zijlstra <peterz@...radead.org>,
        Andy Lutomirski <luto@...nel.org>,
        Thomas Gleixner <tglx@...utronix.de>, Ingo Molnar <mingo@...hat.com>,
        Tejun Heo <tj@...nel.org>, Uladzislau Rezki <urezki@...il.com>,
        Dave Hansen <dave.hansen@...ux.intel.com>,
        Christoph Lameter <cl@...two.org>,
        David Hildenbrand <david@...hat.com>,
        Andrey Konovalov <andreyknvl@...il.com>,
        Vincenzo Frascino <vincenzo.frascino@....com>,
        "H. Peter Anvin" <hpa@...or.com>, kasan-dev@...glegroups.com,
        Mike Rapoport <rppt@...nel.org>, Ard Biesheuvel <ardb@...nel.org>,
        linux-kernel@...r.kernel.org, Dmitry Vyukov <dvyukov@...gle.com>,
        Alexander Potapenko <glider@...gle.com>,
        Vlastimil Babka <vbabka@...e.cz>,
        Suren Baghdasaryan <surenb@...gle.com>, Thomas Huth <thuth@...hat.com>,
        John Hubbard <jhubbard@...dia.com>,
        Lorenzo Stoakes <lorenzo.stoakes@...cle.com>,
        Michal Hocko <mhocko@...e.com>,
        "Liam R. Howlett" <Liam.Howlett@...cle.com>, linux-mm@...ck.org,
        "Kirill A. Shutemov" <kas@...nel.org>,
        Oscar Salvador <osalvador@...e.de>, Jane Chu <jane.chu@...cle.com>,
        Gwan-gyeong Mun <gwan-gyeong.mun@...el.com>,
        "Aneesh Kumar K . V" <aneesh.kumar@...ux.ibm.com>,
        Joerg Roedel <joro@...tes.org>, Alistair Popple <apopple@...dia.com>,
        Joao Martins <joao.m.martins@...cle.com>, linux-arch@...r.kernel.org,
        stable@...r.kernel.org
Subject: Re: [PATCH V4 mm-hotfixes 2/3] mm: introduce and use
 {pgd,p4d}_populate_kernel()

On Mon, Aug 25, 2025 at 01:27:20PM +0200, Christophe Leroy wrote:
> 
> 
> Le 11/08/2025 à 07:34, Harry Yoo a écrit :
> > Introduce and use {pgd,p4d}_populate_kernel() in core MM code when
> > populating PGD and P4D entries for the kernel address space.
> > These helpers ensure proper synchronization of page tables when
> > updating the kernel portion of top-level page tables.
> > 
> > Until now, the kernel has relied on each architecture to handle
> > synchronization of top-level page tables in an ad-hoc manner.
> > For example, see commit 9b861528a801 ("x86-64, mem: Update all PGDs for
> > direct mapping and vmemmap mapping changes").
> > 
> > However, this approach has proven fragile for following reasons:
> > 
> >    1) It is easy to forget to perform the necessary page table
> >       synchronization when introducing new changes.
> >       For instance, commit 4917f55b4ef9 ("mm/sparse-vmemmap: improve memory
> >       savings for compound devmaps") overlooked the need to synchronize
> >       page tables for the vmemmap area.
> > 
> >    2) It is also easy to overlook that the vmemmap and direct mapping areas
> >       must not be accessed before explicit page table synchronization.
> >       For example, commit 8d400913c231 ("x86/vmemmap: handle unpopulated
> >       sub-pmd ranges")) caused crashes by accessing the vmemmap area
> >       before calling sync_global_pgds().
> > 
> > To address this, as suggested by Dave Hansen, introduce _kernel() variants
> > of the page table population helpers, which invoke architecture-specific
> > hooks to properly synchronize page tables. These are introduced in a new
> > header file, include/linux/pgalloc.h, so they can be called from common code.
> > 
> > They reuse existing infrastructure for vmalloc and ioremap.
> > Synchronization requirements are determined by ARCH_PAGE_TABLE_SYNC_MASK,
> > and the actual synchronization is performed by arch_sync_kernel_mappings().
> > 
> > This change currently targets only x86_64, so only PGD and P4D level
> > helpers are introduced. In theory, PUD and PMD level helpers can be added
> > later if needed by other architectures.
> 
> AFAIK pmd_populate_kernel() already exist on all architectures, and I'm not
> sure it does what you expect. Or am I missing something ?

It does not do what I expect.

Yes, if someone is going to introduce a PMD level helper, existing
pmd_populate_kernel() should be renamed or removed.

To be honest I'm not really sure why we need both pmd_populate() and
pmd_populate_kernel(). It is introduced by historical commit
3a0b82c08a0e8668 ("adds simple support for atomically-mapped PTEs.
On highmem systems this enables the allocation of the pagetables in
highmem.") [1], but as there's no explanation or comment so I can only
speculate.

Key differences I recognize is 1) the type of the last parameter is
pgtable_t (which can be either struct page * or pte_t * depending on
architecture) in pmd_populate() and pte_t * in pmd_populate_kernel(),
and 2) some architectures treat user and kernel page tables differently.

Regarding 1), I think a reasonable experience is that pmd_populate()
should take struct page * in some architectures because
with CONFIG_HIGHPTE=y pte_t * might not be accessible, but kernel
page tables are not allocated from highmem even with CONFIG_HIGHPTE=y
so pmd_populate_kernel() can take pte_t *, and that can save a few
instructions.

And some architectures (that does not support HIGHPTE?) define pgtable_t
as pte_t * to support sub-page page tables (Commit 2f569afd9ced
("CONFIG_HIGHPTE vs. sub-page page tables.")).

Maybe things to clean up in the future:

1) Once CONFIG_HIGHPTE is completely dropped (is that ever going to
   happen?), pte_t * can be used instead of struct page *. 

2) Convert users of pmd_populate_kernel() to use pmd_populate().
   But some architectures treat user and kernel page tables differently
   and that will be handled in pmd_populate()  (depending on
   (mm == &init_mm))

[1] https://web.git.kernel.org/pub/scm/linux/kernel/git/tglx/history.git/commit/?id=3a0b82c08a0e86683783c30d7fec9d1b06c2fe20

-- 
Cheers,
Harry / Hyeonggon

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ