linux-kernel - Re: [PATCH v2 2/7] mm: introduce local state for lazy

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <7953a735-6129-4d22-be65-ce736630d539@redhat.com>
Date: Fri, 12 Sep 2025 10:04:12 +0200
From: David Hildenbrand <david@...hat.com>
To: Kevin Brodsky <kevin.brodsky@....com>,
 Alexander Gordeev <agordeev@...ux.ibm.com>
Cc: linux-mm@...ck.org, linux-kernel@...r.kernel.org,
 Andreas Larsson <andreas@...sler.com>,
 Andrew Morton <akpm@...ux-foundation.org>,
 Boris Ostrovsky <boris.ostrovsky@...cle.com>, Borislav Petkov
 <bp@...en8.de>, Catalin Marinas <catalin.marinas@....com>,
 Christophe Leroy <christophe.leroy@...roup.eu>,
 Dave Hansen <dave.hansen@...ux.intel.com>,
 "David S. Miller" <davem@...emloft.net>, "H. Peter Anvin" <hpa@...or.com>,
 Ingo Molnar <mingo@...hat.com>, Jann Horn <jannh@...gle.com>,
 Juergen Gross <jgross@...e.com>, "Liam R. Howlett"
 <Liam.Howlett@...cle.com>, Lorenzo Stoakes <lorenzo.stoakes@...cle.com>,
 Madhavan Srinivasan <maddy@...ux.ibm.com>,
 Michael Ellerman <mpe@...erman.id.au>, Michal Hocko <mhocko@...e.com>,
 Mike Rapoport <rppt@...nel.org>, Nicholas Piggin <npiggin@...il.com>,
 Peter Zijlstra <peterz@...radead.org>, Ryan Roberts <ryan.roberts@....com>,
 Suren Baghdasaryan <surenb@...gle.com>, Thomas Gleixner
 <tglx@...utronix.de>, Vlastimil Babka <vbabka@...e.cz>,
 Will Deacon <will@...nel.org>, Yeoreum Yun <yeoreum.yun@....com>,
 linux-arm-kernel@...ts.infradead.org, linuxppc-dev@...ts.ozlabs.org,
 sparclinux@...r.kernel.org, xen-devel@...ts.xenproject.org,
 Mark Rutland <Mark.Rutland@....com>
Subject: Re: [PATCH v2 2/7] mm: introduce local state for lazy_mmu sections

>>
>> struct lazy_mmu_state {
>>      uint8_t enabled_count;
>>      bool paused;
> 
> Looking at the arm64 implementation, I'm thinking: instead of the paused
> member, how about a PF_LAZY_MMU task flag? It would be set when lazy_mmu
> is actually enabled (i.e. inside an enter()/leave() section, and not
> inside a pause()/resume() section). This way, architectures could use
> that flag directly to tell if lazy_mmu is enabled instead of reinventing
> the wheel, all in slightly different ways. Namely:
> 
> * arm64 uses a thread flag (TIF_LAZY_MMU) - this is trivially replaced
> with PF_LAZY_MMU
> * powerpc and sparc use batch->active where batch is a per-CPU variable;
> I expect this can also be replaced with PF_LAZY_MMU
> * x86/xen is more complex as it has xen_lazy_mode which tracks both
> LAZY_MMU and LAZY_CPU modes. I'd probably leave that one alone, unless a
> Xen expert is motivated to refactor it.
> 
> With that approach, the implementation of arch_enter() and arch_leave()
> becomes very simple (no tracking of lazy_mmu status) on arm64, powerpc
> and sparc.
> 
> (Of course we could also have an "enabled" member in lazy_mmu_state
> instead of PF_LAZY_MMU, there is no functional difference.)
> 

No strong opinion, but to me it feels like PF_LAZY_MMU is rather "the 
effective state when combining nested+paused", and might complicate the 
code + sanity checks?

So we could maintain that in addition fairly easily of course from the 
core instead of letting archs do that manually.

I would probably have to see the end result to judge whether removing 
the "paused" bool makes things look more complicated or not.

>> }
>>
>> c) With that config, common-code lazy_mmu_*() functions implement the
>> updating of the lazy_mmu_state in task_struct and call into arch code
>> on the transition from 0->1, 1->0 etc.
> 
> Indeed, this is how I thought about it. There is actually quite a lot
> that can be moved to the generic functions:
> * Updating lazy_mmu_state
> * Sanity checks on lazy_mmu_state (e.g. underflow/overflow)
> * Bailing out if in_interrupt() (not done consistently across arch's at
> the moment)
> 
>>
>> Maybe that can be done through exiting
>> arch_enter_lazy_mmu_mode()/arch_leave_lazy_mmu_mode() callbacks, maybe
>> we need more. I feel like
>> we might be able to implement that through the existing helpers.
> 
> We might want to rename them to align with the new generic helpers, but
> yes otherwise the principle should remain unchanged.
> 
> In fact, we will also need to revive arch_flush_lazy_mmu_mode().

That's okay if it's all hidden behaind a sane core API.

> Indeed,
> in the nested situation, we need the following arch calls:
> 
> enter() -> arch_enter()
>      enter() -> [nothing]
>      leave() -> arch_flush()
> leave() -> arch_leave()
> 
> leave() must always flush whatever arch state was batched, as may be
> expected by the caller.
> 
> How does all that sound?

I am no expert on the "always flush when leaving", but it sounds 
reasonable to me.

Which arch operations would you call from

pause()
continue()

?

>> And on top of the proposal above we will have some
>>
>> struct arch_lazy_mmu_state;
>>
>> define by the architecture (could be an empty struct on most).
>>
>> We can store that inside "struct lazy_mmu_state;" or if we ever have
>> to, start returning only that from the enable/disable etc. functions.
> 
> I'm not sure we'd want to mix those styles (task_struct member + local
> variable), that's adding complexity without much upside... Also having a
> local variable at every nesting level only makes sense if we have an
> arch callback regardless of nesting level, which is unnecessary in this
> proposed API.

Yes, that was rather a "if we ever really run out of space we could look 
into that", I am not a fan of it obviously.

-- 
Cheers

David / dhildenb