[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20260128114936.72280-1-clm@meta.com>
Date: Wed, 28 Jan 2026 03:47:43 -0800
From: Chris Mason <clm@...a.com>
To: Baolin Wang <baolin.wang@...ux.alibaba.com>
CC: <akpm@...ux-foundation.org>, <david@...nel.org>, <catalin.marinas@....com>,
<will@...nel.org>, <lorenzo.stoakes@...cle.com>,
<ryan.roberts@....com>, <Liam.Howlett@...cle.com>, <vbabka@...e.cz>,
<rppt@...nel.org>, <surenb@...gle.com>, <mhocko@...e.com>,
<riel@...riel.com>, <harry.yoo@...cle.com>, <jannh@...gle.com>,
<willy@...radead.org>, <baohua@...nel.org>, <dev.jain@....com>,
<linux-mm@...ck.org>, <linux-arm-kernel@...ts.infradead.org>,
<linux-kernel@...r.kernel.org>
Subject: Re: [PATCH v5 4/5] arm64: mm: implement the architecture-specific clear_flush_young_ptes()
Baolin Wang <baolin.wang@...ux.alibaba.com> wrote:
> Implement the Arm64 architecture-specific clear_flush_young_ptes() to enable
> batched checking of young flags and TLB flushing, improving performance during
> large folio reclamation.
>
> Performance testing:
> Allocate 10G clean file-backed folios by mmap() in a memory cgroup, and try to
> reclaim 8G file-backed folios via the memory.reclaim interface. I can observe
> 33% performance improvement on my Arm64 32-core server (and 10%+ improvement
> on my X86 machine). Meanwhile, the hotspot folio_check_references() dropped
> from approximately 35% to around 5%.
Hi everyone, I ran mm-new through my AI review prompts and this one was
flagged. AI review below:
> diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
> --- a/arch/arm64/include/asm/pgtable.h
> +++ b/arch/arm64/include/asm/pgtable.h
> @@ -1838,6 +1838,17 @@ static inline int ptep_clear_flush_young(struct vm_area_struct *vma,
> return contpte_clear_flush_young_ptes(vma, addr, ptep, 1);
> }
>
> +#define clear_flush_young_ptes clear_flush_young_ptes
> +static inline int clear_flush_young_ptes(struct vm_area_struct *vma,
> + unsigned long addr, pte_t *ptep,
> + unsigned int nr)
> +{
> + if (likely(nr == 1 && !pte_cont(__ptep_get(ptep))))
> + return __ptep_clear_flush_young(vma, addr, ptep);
Should this be checking !pte_valid_cont() instead of !pte_cont()?
The existing ptep_clear_flush_young() above uses !pte_valid_cont() to
determine when to take the fast path. The new function only checks
!pte_cont(), which differs when handling non-present PTEs.
Non-present PTEs (device-private, device-exclusive) can reach
clear_flush_young_ptes() through folio_referenced_one()->
clear_flush_young_ptes_notify(). These entries may have bit 52 set as
part of their encoding, but they aren't valid contiguous mappings.
With the current check, wouldn't such entries incorrectly trigger the
contpte path and potentially cause contpte_clear_flush_young_ptes() to
process additional unrelated PTEs beyond the intended single entry?
> +
> + return contpte_clear_flush_young_ptes(vma, addr, ptep, nr);
> +}
> +
> #define wrprotect_ptes wrprotect_ptes
> static __always_inline void wrprotect_ptes(struct mm_struct *mm,
> unsigned long addr, pte_t *ptep, unsigned int nr)
Powered by blists - more mailing lists