[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <fbfef7fc-4030-462b-b514-498eea6620aa@arm.com>
Date: Thu, 27 Nov 2025 16:57:06 +0000
From: Ryan Roberts <ryan.roberts@....com>
To: Samuel Holland <samuel.holland@...ive.com>,
Palmer Dabbelt <palmer@...belt.com>, Paul Walmsley <pjw@...nel.org>,
linux-riscv@...ts.infradead.org, Andrew Morton <akpm@...ux-foundation.org>,
David Hildenbrand <david@...hat.com>, linux-mm@...ck.org
Cc: devicetree@...r.kernel.org, Suren Baghdasaryan <surenb@...gle.com>,
linux-kernel@...r.kernel.org, Mike Rapoport <rppt@...nel.org>,
Michal Hocko <mhocko@...e.com>, Conor Dooley <conor@...nel.org>,
Lorenzo Stoakes <lorenzo.stoakes@...cle.com>,
Krzysztof Kozlowski <krzk+dt@...nel.org>, Alexandre Ghiti <alex@...ti.fr>,
Emil Renner Berthing <kernel@...il.dk>, Rob Herring <robh+dt@...nel.org>,
Vlastimil Babka <vbabka@...e.cz>, "Liam R . Howlett"
<Liam.Howlett@...cle.com>
Subject: Re: [PATCH v3 08/22] mm: Allow page table accessors to be
non-idempotent
On 13/11/2025 01:45, Samuel Holland wrote:
> Currently, some functions such as pte_offset_map() are passed both
> pointers to hardware page tables, and pointers to previously-read PMD
> entries on the stack. To ensure correctness in the first case, these
> functions must use the page table accessor function (pmdp_get()) to
> dereference the supplied pointer. However, this means pmdp_get() is
> called twice in the second case. This double call must be avoided if
> pmdp_get() applies some non-idempotent transformation to the value.
>
> Avoid the double transformation by calling set_pmd() on the stack
> variables where necessary to keep set_pmd()/pmdp_get() calls balanced.
I don't think this is a good solution.
arm64, at least, expects and requires that only pointers to entries in pgtables
are passed to the arch helpers (e.g. set_pte(), ptep_get(), etc). For PTEs,
arm64 accesses adjacent entries within the page table to manage contiguous
mappings. If it is passed a pointer to a stack variable, it may erroneously
access other stuff on the stack thinking it is an entry in a page table.
I think we should formalize this as a clear requirement for all these functions;
all pte/pmd/pud/p4d/pgd pointers passed to the arch pgtable helpers must always
point to entries in pgtables.
arm64 will very likely take advantage of this in future in the pmd/pud/...
helpers as it does today for the pte level. But even today, arm64's set_pmd()
will emit barriers which are totally unnecessary when operating on a stack
variable that the HW PTW will never see.
Thanks,
Ryan
>
> Signed-off-by: Samuel Holland <samuel.holland@...ive.com>
> ---
>
> (no changes since v2)
>
> Changes in v2:
> - New patch for v2
>
> kernel/events/core.c | 2 ++
> mm/gup.c | 3 +++
> mm/khugepaged.c | 6 ++++--
> mm/page_table_check.c | 3 +++
> mm/pgtable-generic.c | 2 ++
> 5 files changed, 14 insertions(+), 2 deletions(-)
>
> diff --git a/kernel/events/core.c b/kernel/events/core.c
> index fa4f9165bd94..7969b060bf2d 100644
> --- a/kernel/events/core.c
> +++ b/kernel/events/core.c
> @@ -8154,6 +8154,8 @@ static u64 perf_get_pgtable_size(struct mm_struct *mm, unsigned long addr)
> if (pmd_leaf(pmd))
> return pmd_leaf_size(pmd);
>
> + /* transform pmd as if &pmd pointed to a hardware page table */
> + set_pmd(&pmd, pmd);
> ptep = pte_offset_map(&pmd, addr);
> if (!ptep)
> goto again;
> diff --git a/mm/gup.c b/mm/gup.c
> index 549f9e868311..aba61704049e 100644
> --- a/mm/gup.c
> +++ b/mm/gup.c
> @@ -2844,7 +2844,10 @@ static int gup_fast_pte_range(pmd_t pmd, pmd_t *pmdp, unsigned long addr,
> int ret = 0;
> pte_t *ptep, *ptem;
>
> + /* transform pmd as if &pmd pointed to a hardware page table */
> + set_pmd(&pmd, pmd);
> ptem = ptep = pte_offset_map(&pmd, addr);
> + pmd = pmdp_get(&pmd);
> if (!ptep)
> return 0;
> do {
> diff --git a/mm/khugepaged.c b/mm/khugepaged.c
> index 1bff8ade751a..ab1f68a7bc83 100644
> --- a/mm/khugepaged.c
> +++ b/mm/khugepaged.c
> @@ -1724,7 +1724,7 @@ static void retract_page_tables(struct address_space *mapping, pgoff_t pgoff)
> struct mmu_notifier_range range;
> struct mm_struct *mm;
> unsigned long addr;
> - pmd_t *pmd, pgt_pmd;
> + pmd_t *pmd, pgt_pmd, pmdval;
> spinlock_t *pml;
> spinlock_t *ptl;
> bool success = false;
> @@ -1777,7 +1777,9 @@ static void retract_page_tables(struct address_space *mapping, pgoff_t pgoff)
> */
> if (check_pmd_state(pmd) != SCAN_SUCCEED)
> goto drop_pml;
> - ptl = pte_lockptr(mm, pmd);
> + /* pte_lockptr() needs a value, not a pointer to a page table */
> + pmdval = pmdp_get(pmd);
> + ptl = pte_lockptr(mm, &pmdval);
> if (ptl != pml)
> spin_lock_nested(ptl, SINGLE_DEPTH_NESTING);
>
> diff --git a/mm/page_table_check.c b/mm/page_table_check.c
> index 31f4c39d20ef..77d6688db0de 100644
> --- a/mm/page_table_check.c
> +++ b/mm/page_table_check.c
> @@ -260,7 +260,10 @@ void __page_table_check_pte_clear_range(struct mm_struct *mm,
> return;
>
> if (!pmd_bad(pmd) && !pmd_leaf(pmd)) {
> + /* transform pmd as if &pmd pointed to a hardware page table */
> + set_pmd(&pmd, pmd);
> pte_t *ptep = pte_offset_map(&pmd, addr);
> + pmd = pmdp_get(&pmd);
> unsigned long i;
>
> if (WARN_ON(!ptep))
> diff --git a/mm/pgtable-generic.c b/mm/pgtable-generic.c
> index 63a573306bfa..6602deb002f1 100644
> --- a/mm/pgtable-generic.c
> +++ b/mm/pgtable-generic.c
> @@ -299,6 +299,8 @@ pte_t *___pte_offset_map(pmd_t *pmd, unsigned long addr, pmd_t *pmdvalp)
> pmd_clear_bad(pmd);
> goto nomap;
> }
> + /* transform pmdval as if &pmdval pointed to a hardware page table */
> + set_pmd(&pmdval, pmdval);
> return __pte_map(&pmdval, addr);
> nomap:
> rcu_read_unlock();
Powered by blists - more mailing lists