lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <c285f2fb-b64d-4932-b9ae-ef420097728e@lucifer.local>
Date: Fri, 16 Jan 2026 08:41:44 +0000
From: Lorenzo Stoakes <lorenzo.stoakes@...cle.com>
To: Baolin Wang <baolin.wang@...ux.alibaba.com>
Cc: akpm@...ux-foundation.org, david@...nel.org, catalin.marinas@....com,
        will@...nel.org, ryan.roberts@....com, Liam.Howlett@...cle.com,
        vbabka@...e.cz, rppt@...nel.org, surenb@...gle.com, mhocko@...e.com,
        riel@...riel.com, harry.yoo@...cle.com, jannh@...gle.com,
        willy@...radead.org, baohua@...nel.org, dev.jain@....com,
        linux-mm@...ck.org, linux-arm-kernel@...ts.infradead.org,
        linux-kernel@...r.kernel.org
Subject: Re: [PATCH v5 0/5] support batch checking of references and
 unmapping for large folios

Andrew -

I know this has had a lot of attention, but can we hold off on sending this
upstream until either David or I have had a chance to review it?

Also note that Dev has discovered an issue with how this interacts with the
accursed uffd-wp logic (see [0]) so series needs a respin anyway.

Thanks, Lorenzo

[0]: https://lore.kernel.org/linux-mm/20260116082721.275178-1-dev.jain@arm.com/


On Fri, Dec 26, 2025 at 02:07:54PM +0800, Baolin Wang wrote:
> Currently, folio_referenced_one() always checks the young flag for each PTE
> sequentially, which is inefficient for large folios. This inefficiency is
> especially noticeable when reclaiming clean file-backed large folios, where
> folio_referenced() is observed as a significant performance hotspot.
>
> Moreover, on Arm architecture, which supports contiguous PTEs, there is already
> an optimization to clear the young flags for PTEs within a contiguous range.
> However, this is not sufficient. We can extend this to perform batched operations
> for the entire large folio (which might exceed the contiguous range: CONT_PTE_SIZE).
>
> Similar to folio_referenced_one(), we can also apply batched unmapping for large
> file folios to optimize the performance of file folio reclamation. By supporting
> batched checking of the young flags, flushing TLB entries, and unmapping, I can
> observed a significant performance improvements in my performance tests for file
> folios reclamation. Please check the performance data in the commit message of
> each patch.
>
> Run stress-ng and mm selftests, no issues were found.
>
> Patch 1: Add a new generic batched PTE helper that supports batched checks of
> the references for large folios.
> Patch 2 - 3: Preparation patches.
> patch 4: Implement the Arm64 arch-specific clear_flush_young_ptes().
> Patch 5: Support batched unmapping for file large folios.
>
> Changes from v4:
>  - Fix passing the incorrect 'CONT_PTES' for non-batched APIs.
>  - Rename ptep_clear_flush_young_notify() to clear_flush_young_ptes_notify() (per Ryan).
>  - Fix some coding style issues (per Ryan).
>  - Add reviewed tag from Ryan. Thanks.
>
> Changes from v3:
>  - Fix using an incorrect parameter in ptep_clear_flush_young_notify()
>    (per Liam).
>
> Changes from v2:
>  - Rearrange the patch set (per Ryan).
>  - Add pte_cont() check in clear_flush_young_ptes() (per Ryan).
>  - Add a helper to do contpte block alignment (per Ryan).
>  - Fix some coding style issues (per Lorenzo and Ryan).
>  - Add more comments and update the commit message (per Lorenzo and Ryan).
>  - Add acked tag from Barry. Thanks.
>
> Changes from v1:
>  - Add a new patch to support batched unmapping for file large folios.
>  - Update the cover letter
>
> Baolin Wang (5):
>   mm: rmap: support batched checks of the references for large folios
>   arm64: mm: factor out the address and ptep alignment into a new helper
>   arm64: mm: support batch clearing of the young flag for large folios
>   arm64: mm: implement the architecture-specific
>     clear_flush_young_ptes()
>   mm: rmap: support batched unmapping for file large folios
>
>  arch/arm64/include/asm/pgtable.h | 23 ++++++++----
>  arch/arm64/mm/contpte.c          | 62 ++++++++++++++++++++------------
>  include/linux/mmu_notifier.h     |  9 ++---
>  include/linux/pgtable.h          | 31 ++++++++++++++++
>  mm/rmap.c                        | 38 ++++++++++++++++----
>  5 files changed, 125 insertions(+), 38 deletions(-)
>
> --
> 2.47.3
>

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ