linux-kernel - Re: [PATCH v1] arm64/mm: Ensure lazy_mmu

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20250612145906.GB12912@willie-the-truck>
Date: Thu, 12 Jun 2025 15:59:06 +0100
From: Will Deacon <will@...nel.org>
To: Ryan Roberts <ryan.roberts@....com>
Cc: Catalin Marinas <catalin.marinas@....com>,
	linux-arm-kernel@...ts.infradead.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH v1] arm64/mm: Ensure lazy_mmu_mode never nests

On Tue, Jun 10, 2025 at 05:37:20PM +0100, Ryan Roberts wrote:
> On 10/06/2025 16:07, Catalin Marinas wrote:
> > On Tue, Jun 10, 2025 at 02:41:01PM +0100, Ryan Roberts wrote:
> >> On 10/06/2025 13:00, Catalin Marinas wrote:
> >>> On Fri, Jun 06, 2025 at 02:56:52PM +0100, Ryan Roberts wrote:
> >>>> Commit 1ef3095b1405 ("arm64/mm: Permit lazy_mmu_mode to be nested")
> >>>> provided a quick fix to ensure that lazy_mmu_mode continues to work when
> >>>> CONFIG_DEBUG_PAGEALLOC is enabled, which can cause lazy_mmu_mode to
> >>>> nest.
> >>>>
> >>>> The solution in that patch is the make the implementation tolerant to
> >>>
> >>> s/is the make/is to make/
> >>>
> >>>> nesting; when the inner nest exits lazy_mmu_mode, we exit then the outer
> >>>> exit becomes a nop. But this sacrifices the optimization opportunity for
> >>>> the remainder of the outer user.
> >>> [...]
> >>>> I wonder if you might be willing to take this for v6.16? I think its a neater
> >>>> solution then my first attempt - Commit 1ef3095b1405 ("arm64/mm: Permit
> >>>> lazy_mmu_mode to be nested") - which is already in Linus's master.
> >>>>
> >>>> To be clear, the current solution is safe, I just think this is much neater.
> >>>
> >>> Maybe better, though I wouldn't say much neater. One concern I have is
> >>> about whether we'll get other such nesting in the future and we need to
> >>> fix them in generic code. Here we control __kernel_map_pages() but we
> >>> may not for other cases.
> >>>
> >>> Is it the fault of the arch code that uses apply_to_page_range() via
> >>> __kernel_map_pages()? It feels like it shouldn't care about the lazy
> >>> mode as that's some detail of the apply_to_page_range() implementation.
> >>> Maybe this API should just allow nesting.
> >>
> >> I don't think it is possible to properly support nesting:
> >>
> >> enter_lazy_mmu
> >>     for_each_pte {
> >>         read/modify-write pte
> >>
> >>         alloc_page
> >>             enter_lazy_mmu
> >>                 make page valid
> >>             exit_lazy_mmu
> >>
> >>         write_to_page
> >>     }
> >> exit_lazy_mmu
> >>
> >> This example only works because lazy_mmu doesn't support nesting. The "make page
> >> valid" operation is completed by the time of the inner exit_lazy_mmu so that the
> >> page can be accessed in write_to_page. If nesting was supported, the inner
> >> exit_lazy_mmu would become a nop and write_to_page would explode.
> > 
> > What I meant is for enter/exit_lazy_mmu to handle a kind of de-nesting
> > themselves: enter_lazy_mmu would emit the barriers if already in lazy
> > mode, clear pending (well, it doesn't need to do this but it may be
> > easier to reason about in terms of flattening). exit_lazy_mmu also needs
> > to emit the barriers but leave the lazy mode on if already on when last
> > entered. This does need some API modifications to return the old mode on
> > enter and get an argument for exit. But the correctness wouldn't be
> > affected since exit_lazy_mmu still emits the barriers irrespective of
> > the nesting level.
> 
> Ahh I see what you mean now; exit always emits barriers but only the last exit
> clears TIF_LAZY_MMU.
> 
> I think that's much cleaner, but we are changing the API which needs changes to
> all the arches and my attempt at [1] didn't really gain much enthusiasm.

To be honest, I don't think the proposal in this series is really
improving what we have. Either we support nested lazy mode or we don't;
having __kernel_map_pages() mess around with the lazy mmu state because
it somehow knows that set_memory_valid() is going to use it is fragile
and ugly.

So I'm incined to leave the current code as-is, unless we can remove it
in favour of teaching the core code how to handle it instead.

Cheers,

Will