lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <94889730-1AEF-458F-B623-04092C0D6819@linux.ibm.com>
Date: Wed, 3 Dec 2025 21:38:46 +0530
From: Venkat <venkat88@...ux.ibm.com>
To: Kevin Brodsky <kevin.brodsky@....com>
Cc: linux-mm@...ck.org, LKML <linux-kernel@...r.kernel.org>,
        Alexander Gordeev <agordeev@...ux.ibm.com>,
        Andreas Larsson <andreas@...sler.com>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Boris Ostrovsky <boris.ostrovsky@...cle.com>,
        Borislav Petkov <bp@...en8.de>,
        Catalin Marinas <catalin.marinas@....com>,
        Christophe Leroy <christophe.leroy@...roup.eu>,
        Dave Hansen <dave.hansen@...ux.intel.com>,
        David Hildenbrand <david@...hat.com>,
        "David S. Miller" <davem@...emloft.net>,
        David Woodhouse <dwmw2@...radead.org>,
        "H. Peter Anvin" <hpa@...or.com>, Ingo Molnar <mingo@...hat.com>,
        Jann Horn <jannh@...gle.com>, Juergen Gross <jgross@...e.com>,
        "Liam R. Howlett" <Liam.Howlett@...cle.com>,
        Lorenzo Stoakes <lorenzo.stoakes@...cle.com>,
        Madhavan Srinivasan <maddy@...ux.ibm.com>,
        Michael Ellerman <mpe@...erman.id.au>, Michal Hocko <mhocko@...e.com>,
        Mike Rapoport <rppt@...nel.org>, Nicholas Piggin <npiggin@...il.com>,
        Peter Zijlstra <peterz@...radead.org>,
        "Ritesh Harjani (IBM)" <ritesh.list@...il.com>,
        Ryan Roberts <ryan.roberts@....com>,
        Suren Baghdasaryan <surenb@...gle.com>,
        Thomas Gleixner <tglx@...utronix.de>, Vlastimil Babka <vbabka@...e.cz>,
        Will Deacon <will@...nel.org>, Yeoreum Yun <yeoreum.yun@....com>,
        linux-arm-kernel@...ts.infradead.org,
        linuxppc-dev <linuxppc-dev@...ts.ozlabs.org>,
        sparclinux@...r.kernel.org, xen-devel@...ts.xenproject.org,
        x86@...nel.org
Subject: Re: [PATCH v5 00/12] Nesting support for lazy MMU mode



> On 24 Nov 2025, at 6:52 PM, Kevin Brodsky <kevin.brodsky@....com> wrote:
> 
> When the lazy MMU mode was introduced eons ago, it wasn't made clear
> whether such a sequence was legal:
> 
> arch_enter_lazy_mmu_mode()
> ...
> arch_enter_lazy_mmu_mode()
> ...
> arch_leave_lazy_mmu_mode()
> ...
> arch_leave_lazy_mmu_mode()
> 
> It seems fair to say that nested calls to
> arch_{enter,leave}_lazy_mmu_mode() were not expected, and most
> architectures never explicitly supported it.
> 
> Nesting does in fact occur in certain configurations, and avoiding it
> has proved difficult. This series therefore enables lazy_mmu sections to
> nest, on all architectures.
> 
> Nesting is handled using a counter in task_struct (patch 8), like other
> stateless APIs such as pagefault_{disable,enable}(). This is fully
> handled in a new generic layer in <linux/pgtable.h>; the arch_* API
> remains unchanged. A new pair of calls, lazy_mmu_mode_{pause,resume}(),
> is also introduced to allow functions that are called with the lazy MMU
> mode enabled to temporarily pause it, regardless of nesting.
> 
> An arch now opts in to using the lazy MMU mode by selecting
> CONFIG_ARCH_LAZY_MMU; this is more appropriate now that we have a
> generic API, especially with state conditionally added to task_struct.
> 
> ---
> 
> Background: Ryan Roberts' series from March [1] attempted to prevent
> nesting from ever occurring, and mostly succeeded. Unfortunately, a
> corner case (DEBUG_PAGEALLOC) may still cause nesting to occur on arm64.
> Ryan proposed [2] to address that corner case at the generic level but
> this approach received pushback; [3] then attempted to solve the issue
> on arm64 only, but it was deemed too fragile.
> 
> It feels generally difficult to guarantee that lazy_mmu sections don't
> nest, because callers of various standard mm functions do not know if
> the function uses lazy_mmu itself.
> 
> The overall approach in v3/v4 is very close to what David Hildenbrand
> proposed on v2 [4].
> 
> Unlike in v1/v2, no special provision is made for architectures to
> save/restore extra state when entering/leaving the mode. Based on the
> discussions so far, this does not seem to be required - an arch can
> store any relevant state in thread_struct during arch_enter() and
> restore it in arch_leave(). Nesting is not a concern as these functions
> are only called at the top level, not in nested sections.
> 
> The introduction of a generic layer, and tracking of the lazy MMU state
> in task_struct, also allows to streamline the arch callbacks - this
> series removes 67 lines from arch/.
> 
> Patch overview:
> 
> * Patch 1: cleanup - avoids having to deal with the powerpc
>  context-switching code
> 
> * Patch 2-4: prepare arch_flush_lazy_mmu_mode() to be called from the
>  generic layer (patch 8)
> 
> * Patch 5-6: new API + CONFIG_ARCH_LAZY_MMU
> 
> * Patch 7: ensure correctness in interrupt context
> 
> * Patch 8: nesting support
> 
> * Patch 9-12: replace arch-specific tracking of lazy MMU mode with
>  generic API
> 
> This series has been tested by running the mm kselftests on arm64 with
> DEBUG_VM, DEBUG_PAGEALLOC, KFENCE and KASAN. It was also build-tested on
> other architectures (with and without XEN_PV on x86).
> 
> - Kevin
> 
> [1] https://lore.kernel.org/all/20250303141542.3371656-1-ryan.roberts@arm.com/
> [2] https://lore.kernel.org/all/20250530140446.2387131-1-ryan.roberts@arm.com/
> [3] https://lore.kernel.org/all/20250606135654.178300-1-ryan.roberts@arm.com/
> [4] https://lore.kernel.org/all/ef343405-c394-4763-a79f-21381f217b6c@redhat.com/
> ---
> Changelog
> 
> v4..v5:
> 
> - Rebased on mm-unstable
> - Patch 3: added missing radix_enabled() check in arch_flush()
>  [Ritesh Harjani]
> - Patch 6: declare arch_flush_lazy_mmu_mode() as static inline on x86
>  [Ryan Roberts]
> - Patch 7 (formerly 12): moved before patch 8 to ensure correctness in
>  interrupt context [Ryan]. The diffs in in_lazy_mmu_mode() and
>  queue_pte_barriers() are moved to patch 8 and 9 resp.
> - Patch 8:
>  * Removed all restrictions regarding lazy_mmu_mode_{pause,resume}().
>    They may now be called even when lazy MMU isn't enabled, and
>    any call to lazy_mmu_mode_* may be made while paused (such calls
>    will be ignored). [David, Ryan]
>  * lazy_mmu_state.{nesting_level,active} are replaced with
>    {enable_count,pause_count} to track arbitrary nesting of both
>    enable/disable and pause/resume [Ryan]
>  * Added __task_lazy_mmu_mode_active() for use in patch 12 [David]
>  * Added documentation for all the functions [Ryan]
> - Patch 9: keep existing test + set TIF_LAZY_MMU_PENDING instead of
>  atomic RMW [David, Ryan]
> - Patch 12: use __task_lazy_mmu_mode_active() instead of accessing
>  lazy_mmu_state directly [David]
> - Collected R-b/A-b tags
> 
> v4: https://lore.kernel.org/all/20251029100909.3381140-1-kevin.brodsky@arm.com/
> 
> v3..v4:
> 
> - Patch 2: restored ordering of preempt_{disable,enable}() [Dave Hansen]
> - Patch 5 onwards: s/ARCH_LAZY_MMU/ARCH_HAS_LAZY_MMU_MODE/ [Mike Rapoport]
> - Patch 7: renamed lazy_mmu_state members, removed VM_BUG_ON(),
>  reordered writes to lazy_mmu_state members [David Hildenbrand]
> - Dropped patch 13 as it doesn't seem justified [David H]
> - Various improvements to commit messages [David H]
> 
> v3: https://lore.kernel.org/all/20251015082727.2395128-1-kevin.brodsky@arm.com/
> 
> v2..v3:
> 
> - Full rewrite; dropped all Acked-by/Reviewed-by.
> - Rebased on v6.18-rc1.
> 
> v2: https://lore.kernel.org/all/20250908073931.4159362-1-kevin.brodsky@arm.com/
> 
> v1..v2:
> - Rebased on mm-unstable.
> - Patch 2: handled new calls to enter()/leave(), clarified how the "flush"
>  pattern (leave() followed by enter()) is handled.
> - Patch 5,6: removed unnecessary local variable [Alexander Gordeev's
>  suggestion].
> - Added Mike Rapoport's Acked-by.
> 
> v1: https://lore.kernel.org/all/20250904125736.3918646-1-kevin.brodsky@arm.com/
> ---
> Cc: Alexander Gordeev <agordeev@...ux.ibm.com>
> Cc: Andreas Larsson <andreas@...sler.com>
> Cc: Andrew Morton <akpm@...ux-foundation.org>
> Cc: Boris Ostrovsky <boris.ostrovsky@...cle.com>
> Cc: Borislav Petkov <bp@...en8.de>
> Cc: Catalin Marinas <catalin.marinas@....com>
> Cc: Christophe Leroy <christophe.leroy@...roup.eu>
> Cc: Dave Hansen <dave.hansen@...ux.intel.com>
> Cc: David Hildenbrand <david@...hat.com>
> Cc: "David S. Miller" <davem@...emloft.net>
> Cc: David Woodhouse <dwmw2@...radead.org>
> Cc: "H. Peter Anvin" <hpa@...or.com>
> Cc: Ingo Molnar <mingo@...hat.com>
> Cc: Jann Horn <jannh@...gle.com>
> Cc: Juergen Gross <jgross@...e.com>
> Cc: "Liam R. Howlett" <Liam.Howlett@...cle.com>
> Cc: Lorenzo Stoakes <lorenzo.stoakes@...cle.com>
> Cc: Madhavan Srinivasan <maddy@...ux.ibm.com>
> Cc: Michael Ellerman <mpe@...erman.id.au>
> Cc: Michal Hocko <mhocko@...e.com>
> Cc: Mike Rapoport <rppt@...nel.org>
> Cc: Nicholas Piggin <npiggin@...il.com>
> Cc: Peter Zijlstra <peterz@...radead.org>
> Cc: Ritesh Harjani (IBM) <ritesh.list@...il.com>
> Cc: Ryan Roberts <ryan.roberts@....com>
> Cc: Suren Baghdasaryan <surenb@...gle.com>
> Cc: Thomas Gleixner <tglx@...utronix.de>
> Cc: Venkat Rao Bagalkote <venkat88@...ux.ibm.com>
> Cc: Vlastimil Babka <vbabka@...e.cz>
> Cc: Will Deacon <will@...nel.org>
> Cc: Yeoreum Yun <yeoreum.yun@....com>
> Cc: linux-arm-kernel@...ts.infradead.org
> Cc: linux-kernel@...r.kernel.org
> Cc: linuxppc-dev@...ts.ozlabs.org
> Cc: sparclinux@...r.kernel.org
> Cc: xen-devel@...ts.xenproject.org
> Cc: x86@...nel.org
> ---
> Alexander Gordeev (1):
>  powerpc/64s: Do not re-activate batched TLB flush
> 
> Kevin Brodsky (11):
>  x86/xen: simplify flush_lazy_mmu()
>  powerpc/mm: implement arch_flush_lazy_mmu_mode()
>  sparc/mm: implement arch_flush_lazy_mmu_mode()
>  mm: introduce CONFIG_ARCH_HAS_LAZY_MMU_MODE
>  mm: introduce generic lazy_mmu helpers
>  mm: bail out of lazy_mmu_mode_* in interrupt context
>  mm: enable lazy_mmu sections to nest
>  arm64: mm: replace TIF_LAZY_MMU with in_lazy_mmu_mode()
>  powerpc/mm: replace batch->active with in_lazy_mmu_mode()
>  sparc/mm: replace batch->active with in_lazy_mmu_mode()
>  x86/xen: use lazy_mmu_state when context-switching
> 
> arch/arm64/Kconfig                            |   1 +
> arch/arm64/include/asm/pgtable.h              |  41 +----
> arch/arm64/include/asm/thread_info.h          |   3 +-
> arch/arm64/mm/mmu.c                           |   4 +-
> arch/arm64/mm/pageattr.c                      |   4 +-
> .../include/asm/book3s/64/tlbflush-hash.h     |  20 ++-
> arch/powerpc/include/asm/thread_info.h        |   2 -
> arch/powerpc/kernel/process.c                 |  25 ---
> arch/powerpc/mm/book3s64/hash_tlb.c           |  10 +-
> arch/powerpc/mm/book3s64/subpage_prot.c       |   4 +-
> arch/powerpc/platforms/Kconfig.cputype        |   1 +
> arch/sparc/Kconfig                            |   1 +
> arch/sparc/include/asm/tlbflush_64.h          |   5 +-
> arch/sparc/mm/tlb.c                           |  14 +-
> arch/x86/Kconfig                              |   1 +
> arch/x86/boot/compressed/misc.h               |   1 +
> arch/x86/boot/startup/sme.c                   |   1 +
> arch/x86/include/asm/paravirt.h               |   1 -
> arch/x86/include/asm/pgtable.h                |   1 +
> arch/x86/include/asm/thread_info.h            |   4 +-
> arch/x86/xen/enlighten_pv.c                   |   3 +-
> arch/x86/xen/mmu_pv.c                         |   6 +-
> fs/proc/task_mmu.c                            |   4 +-
> include/linux/mm_types_task.h                 |   5 +
> include/linux/pgtable.h                       | 147 +++++++++++++++++-
> include/linux/sched.h                         |  45 ++++++
> mm/Kconfig                                    |   3 +
> mm/kasan/shadow.c                             |   8 +-
> mm/madvise.c                                  |  18 +--
> mm/memory.c                                   |  16 +-
> mm/migrate_device.c                           |   8 +-
> mm/mprotect.c                                 |   4 +-
> mm/mremap.c                                   |   4 +-
> mm/userfaultfd.c                              |   4 +-
> mm/vmalloc.c                                  |  12 +-
> mm/vmscan.c                                   |  12 +-
> 36 files changed, 282 insertions(+), 161 deletions(-)

Tested this patch series by applying on top of mm-unstable, on both HASH and RADIX MMU, and all tests are passed on both MMU’s.

Ran: cache_shape, copyloops, mm from linux source, selftests/powerpc/ and ran memory-hotplug from selftests/. Also ran below tests from avocado misc-test repo.

Link to repo: https://github.com/avocado-framework-tests/avocado-misc-tests

avocado-misc-tests/memory/stutter.py
avocado-misc-tests/memory/eatmemory.py
avocado-misc-tests/memory/hugepage_sanity.py
avocado-misc-tests/memory/fork_mem.py
avocado-misc-tests/memory/memory_api.py
avocado-misc-tests/memory/mprotect.py
avocado-misc-tests/memory/vatest.py avocado-misc-tests/memory/vatest.py.data/vatest.yaml
avocado-misc-tests/memory/transparent_hugepages.py
avocado-misc-tests/memory/transparent_hugepages_swapping.py
avocado-misc-tests/memory/transparent_hugepages_defrag.py
avocado-misc-tests/memory/ksm_poison.py

If its good enough, please add below tag for PowerPC changes.

Tested-by: Venkat Rao Bagalkote <venkat88@...ux.ibm.com>

Regards,
Venkat.
> 
> 
> base-commit: 1f1edd95f9231ba58a1e535b10200cb1eeaf1f67
> -- 
> 2.51.2
> 


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ