[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <aPnUTfMXD7qReWUl@kernel.org>
Date: Thu, 23 Oct 2025 10:07:57 +0300
From: Mike Rapoport <rppt@...nel.org>
To: Lu Baolu <baolu.lu@...ux.intel.com>
Cc: Joerg Roedel <joro@...tes.org>, Will Deacon <will@...nel.org>,
Robin Murphy <robin.murphy@....com>,
Kevin Tian <kevin.tian@...el.com>, Jason Gunthorpe <jgg@...dia.com>,
Jann Horn <jannh@...gle.com>, Vasant Hegde <vasant.hegde@....com>,
Thomas Gleixner <tglx@...utronix.de>,
Ingo Molnar <mingo@...hat.com>, Borislav Petkov <bp@...en8.de>,
Dave Hansen <dave.hansen@...el.com>,
Alistair Popple <apopple@...dia.com>,
Peter Zijlstra <peterz@...radead.org>,
Uladzislau Rezki <urezki@...il.com>,
Jean-Philippe Brucker <jean-philippe@...aro.org>,
Andy Lutomirski <luto@...nel.org>, Yi Lai <yi1.lai@...el.com>,
David Hildenbrand <david@...hat.com>,
Lorenzo Stoakes <lorenzo.stoakes@...cle.com>,
"Liam R . Howlett" <Liam.Howlett@...cle.com>,
Andrew Morton <akpm@...ux-foundation.org>,
Vlastimil Babka <vbabka@...e.cz>, Michal Hocko <mhocko@...nel.org>,
Matthew Wilcox <willy@...radead.org>,
Vinicius Costa Gomes <vinicius.gomes@...el.com>,
iommu@...ts.linux.dev, security@...nel.org, x86@...nel.org,
linux-mm@...ck.org, linux-kernel@...r.kernel.org,
Dave Hansen <dave.hansen@...ux.intel.com>
Subject: Re: [PATCH v7 2/8] mm: Add a ptdesc flag to mark kernel page tables
On Wed, Oct 22, 2025 at 04:26:28PM +0800, Lu Baolu wrote:
> From: Dave Hansen <dave.hansen@...ux.intel.com>
>
> The page tables used to map the kernel and userspace often have very
> different handling rules. There are frequently *_kernel() variants of
> functions just for kernel page tables. That's not great and has lead
> to code duplication.
>
> Instead of having completely separate call paths, allow a 'ptdesc' to
> be marked as being for kernel mappings. Introduce helpers to set and
> clear this status.
>
> Note: this uses the PG_referenced bit. Page flags are a great fit for
> this since it is truly a single bit of information. Use PG_referenced
> itself because it's a fairly benign flag (as opposed to things like
> PG_lock). It's also (according to Willy) unlikely to go away any time
> soon.
>
> PG_referenced is not in PAGE_FLAGS_CHECK_AT_FREE. It does not need to
> be cleared before freeing the page, and pages coming out of the
> allocator should have it cleared. Regardless, introduce an API to
> clear it anyway. Having symmetry in the API makes it easier to change
> the underlying implementation later, like if there was a need to move
> to a PAGE_FLAGS_CHECK_AT_FREE bit.
>
> Signed-off-by: Dave Hansen <dave.hansen@...ux.intel.com>
> Signed-off-by: Lu Baolu <baolu.lu@...ux.intel.com>
> Reviewed-by: Jason Gunthorpe <jgg@...dia.com>
> Reviewed-by: Kevin Tian <kevin.tian@...el.com>
> Acked-by: David Hildenbrand <david@...hat.com>
Acked-by: Mike Rapoport (Microsoft) <rppt@...nel.org>
> ---
> include/linux/mm.h | 41 +++++++++++++++++++++++++++++++++++++++++
> 1 file changed, 41 insertions(+)
>
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index d16b33bacc32..354d7925bf77 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -2940,6 +2940,7 @@ static inline pmd_t *pmd_alloc(struct mm_struct *mm, pud_t *pud, unsigned long a
> #endif /* CONFIG_MMU */
>
> enum pt_flags {
> + PT_kernel = PG_referenced,
> PT_reserved = PG_reserved,
> /* High bits are used for zone/node/section */
> };
> @@ -2965,6 +2966,46 @@ static inline bool pagetable_is_reserved(struct ptdesc *pt)
> return test_bit(PT_reserved, &pt->pt_flags.f);
> }
>
> +/**
> + * ptdesc_set_kernel - Mark a ptdesc used to map the kernel
> + * @ptdesc: The ptdesc to be marked
> + *
> + * Kernel page tables often need special handling. Set a flag so that
> + * the handling code knows this ptdesc will not be used for userspace.
> + */
> +static inline void ptdesc_set_kernel(struct ptdesc *ptdesc)
> +{
> + set_bit(PT_kernel, &ptdesc->pt_flags.f);
> +}
> +
> +/**
> + * ptdesc_clear_kernel - Mark a ptdesc as no longer used to map the kernel
> + * @ptdesc: The ptdesc to be unmarked
> + *
> + * Use when the ptdesc is no longer used to map the kernel and no longer
> + * needs special handling.
> + */
> +static inline void ptdesc_clear_kernel(struct ptdesc *ptdesc)
> +{
> + /*
> + * Note: the 'PG_referenced' bit does not strictly need to be
> + * cleared before freeing the page. But this is nice for
> + * symmetry.
> + */
> + clear_bit(PT_kernel, &ptdesc->pt_flags.f);
> +}
> +
> +/**
> + * ptdesc_test_kernel - Check if a ptdesc is used to map the kernel
> + * @ptdesc: The ptdesc being tested
> + *
> + * Call to tell if the ptdesc used to map the kernel.
> + */
> +static inline bool ptdesc_test_kernel(const struct ptdesc *ptdesc)
> +{
> + return test_bit(PT_kernel, &ptdesc->pt_flags.f);
> +}
> +
> /**
> * pagetable_alloc - Allocate pagetables
> * @gfp: GFP flags
> --
> 2.43.0
>
--
Sincerely yours,
Mike.
Powered by blists - more mailing lists