linux-kernel - Re: [PATCH v1] arm64/mm: Ensure lazy_mmu

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <b567a16a-8d80-4aab-84c2-21cbc6a6a35d@arm.com>
Date: Tue, 10 Jun 2025 17:37:20 +0100
From: Ryan Roberts <ryan.roberts@....com>
To: Catalin Marinas <catalin.marinas@....com>
Cc: Will Deacon <will@...nel.org>, linux-arm-kernel@...ts.infradead.org,
 linux-kernel@...r.kernel.org
Subject: Re: [PATCH v1] arm64/mm: Ensure lazy_mmu_mode never nests

On 10/06/2025 16:07, Catalin Marinas wrote:
> On Tue, Jun 10, 2025 at 02:41:01PM +0100, Ryan Roberts wrote:
>> On 10/06/2025 13:00, Catalin Marinas wrote:
>>> On Fri, Jun 06, 2025 at 02:56:52PM +0100, Ryan Roberts wrote:
>>>> Commit 1ef3095b1405 ("arm64/mm: Permit lazy_mmu_mode to be nested")
>>>> provided a quick fix to ensure that lazy_mmu_mode continues to work when
>>>> CONFIG_DEBUG_PAGEALLOC is enabled, which can cause lazy_mmu_mode to
>>>> nest.
>>>>
>>>> The solution in that patch is the make the implementation tolerant to
>>>
>>> s/is the make/is to make/
>>>
>>>> nesting; when the inner nest exits lazy_mmu_mode, we exit then the outer
>>>> exit becomes a nop. But this sacrifices the optimization opportunity for
>>>> the remainder of the outer user.
>>> [...]
>>>> I wonder if you might be willing to take this for v6.16? I think its a neater
>>>> solution then my first attempt - Commit 1ef3095b1405 ("arm64/mm: Permit
>>>> lazy_mmu_mode to be nested") - which is already in Linus's master.
>>>>
>>>> To be clear, the current solution is safe, I just think this is much neater.
>>>
>>> Maybe better, though I wouldn't say much neater. One concern I have is
>>> about whether we'll get other such nesting in the future and we need to
>>> fix them in generic code. Here we control __kernel_map_pages() but we
>>> may not for other cases.
>>>
>>> Is it the fault of the arch code that uses apply_to_page_range() via
>>> __kernel_map_pages()? It feels like it shouldn't care about the lazy
>>> mode as that's some detail of the apply_to_page_range() implementation.
>>> Maybe this API should just allow nesting.
>>
>> I don't think it is possible to properly support nesting:
>>
>> enter_lazy_mmu
>>     for_each_pte {
>>         read/modify-write pte
>>
>>         alloc_page
>>             enter_lazy_mmu
>>                 make page valid
>>             exit_lazy_mmu
>>
>>         write_to_page
>>     }
>> exit_lazy_mmu
>>
>> This example only works because lazy_mmu doesn't support nesting. The "make page
>> valid" operation is completed by the time of the inner exit_lazy_mmu so that the
>> page can be accessed in write_to_page. If nesting was supported, the inner
>> exit_lazy_mmu would become a nop and write_to_page would explode.
> 
> What I meant is for enter/exit_lazy_mmu to handle a kind of de-nesting
> themselves: enter_lazy_mmu would emit the barriers if already in lazy
> mode, clear pending (well, it doesn't need to do this but it may be
> easier to reason about in terms of flattening). exit_lazy_mmu also needs
> to emit the barriers but leave the lazy mode on if already on when last
> entered. This does need some API modifications to return the old mode on
> enter and get an argument for exit. But the correctness wouldn't be
> affected since exit_lazy_mmu still emits the barriers irrespective of
> the nesting level.

Ahh I see what you mean now; exit always emits barriers but only the last exit
clears TIF_LAZY_MMU.

I think that's much cleaner, but we are changing the API which needs changes to
all the arches and my attempt at [1] didn't really gain much enthusiasm.

> 
>> So the conclusion I eventually came to (after being nudged by Mike Rapoport at
>> [1]) is that this _is_ arm64's fault for creating a loop via
>> apply_to_page_range(). So I'm trying to fix this by breaking the loop.
>>
>> [1] https://lore.kernel.org/all/aDqz7H-oBo35FRXe@kernel.org/
> 
> If the only such loop is the arm64 code, fine by me to fix it this way.
> Your series above had 6 patches and I thought there's more to fix.

My previous attempt was, in hindsight, hideous. It seems to be the case that
only arm64 suffers this problem so let's just fix it here.

> 
>> We could alternatively use some per-cpu storage for a nest count, but that gets
>> ugly quite quickly I suspect.
> 
> Yeah, the thread flag is much nicer.
> 
>> But regardless, I'm not convinced the semantics of a properly nested
>> lazy_mmu are safe.
> 
> I think we can make them safe but there would be opposition from the mm
> people and it may not be trivial on x86. So, I'm fine with this arm64
> specific change:
> 
> Reviewed-by: Catalin Marinas <catalin.marinas@....com>

Agreed, and thanks!