linux-kernel - Re: [PATCH] arm64/mm: Disable barrier batching in interrupt contexts

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <e37d5e61-54e7-4425-837f-25a13f5a68b5@arm.com>
Date: Mon, 12 May 2025 13:00:05 +0100
From: Ryan Roberts <ryan.roberts@....com>
To: David Hildenbrand <david@...hat.com>,
 Catalin Marinas <catalin.marinas@....com>, Will Deacon <will@...nel.org>,
 Pasha Tatashin <pasha.tatashin@...een.com>,
 Andrew Morton <akpm@...ux-foundation.org>,
 Uladzislau Rezki <urezki@...il.com>, Christoph Hellwig <hch@...radead.org>,
 "Matthew Wilcox (Oracle)" <willy@...radead.org>,
 Mark Rutland <mark.rutland@....com>,
 Anshuman Khandual <anshuman.khandual@....com>,
 Alexandre Ghiti <alexghiti@...osinc.com>,
 Kevin Brodsky <kevin.brodsky@....com>
Cc: linux-arm-kernel@...ts.infradead.org, linux-mm@...ck.org,
 linux-kernel@...r.kernel.org,
 syzbot+5c0d9392e042f41d45c5@...kaller.appspotmail.com
Subject: Re: [PATCH] arm64/mm: Disable barrier batching in interrupt contexts

On 12/05/2025 12:07, David Hildenbrand wrote:
> On 12.05.25 12:22, Ryan Roberts wrote:
>> Commit 5fdd05efa1cd ("arm64/mm: Batch barriers when updating kernel
>> mappings") enabled arm64 kernels to track "lazy mmu mode" using TIF
>> flags in order to defer barriers until exiting the mode. At the same
>> time, it added warnings to check that pte manipulations were never
>> performed in interrupt context, because the tracking implementation
>> could not deal with nesting.
>>
>> But it turns out that some debug features (e.g. KFENCE, DEBUG_PAGEALLOC)
>> do manipulate ptes in softirq context, which triggered the warnings.
>>
>> So let's take the simplest and safest route and disable the batching
>> optimization in interrupt contexts. This makes these users no worse off
>> than prior to the optimization. Additionally the known offenders are
>> debug features that only manipulate a single PTE, so there is no
>> performance gain anyway.
>>
>> There may be some obscure case of encrypted/decrypted DMA with the
>> dma_free_coherent called from an interrupt context, but again, this is
>> no worse off than prior to the commit.
>>
>> Some options for supporting nesting were considered, but there is a
>> difficult to solve problem if any code manipulates ptes within interrupt
>> context but *outside of* a lazy mmu region. If this case exists, the
>> code would expect the updates to be immediate, but because the task
>> context may have already been in lazy mmu mode, the updates would be
>> deferred, which could cause incorrect behaviour. This problem is avoided
>> by always ensuring updates within interrupt context are immediate.
>>
>> Fixes: 5fdd05efa1cd ("arm64/mm: Batch barriers when updating kernel mappings")
>> Reported-by: syzbot+5c0d9392e042f41d45c5@...kaller.appspotmail.com
>> Closes: https://lore.kernel.org/linux-arm-
>> kernel/681f2a09.050a0220.f2294.0006.GAE@...gle.com/
>> Signed-off-by: Ryan Roberts <ryan.roberts@....com>
>> ---
>>
>> Hi Will,
>>
>> I've tested before and after with KFENCE enabled and it solves the issue. I've
>> also run all the mm-selftests which all continue to pass.
>>
>> Catalin suggested a Fixes patch targetting the SHA as it is in for-next/mm was
>> the preferred approach, but shout if you want something different. I'm hoping
>> that with this fix we can still make it for this cycle, subject to not finding
>> any more issues.
>>
>> Thanks,
>> Ryan
>>
>>
>>   arch/arm64/include/asm/pgtable.h | 16 ++++++++++++++--
>>   1 file changed, 14 insertions(+), 2 deletions(-)
>>
>> diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
>> index ab4a1b19e596..e65083ec35cb 100644
>> --- a/arch/arm64/include/asm/pgtable.h
>> +++ b/arch/arm64/include/asm/pgtable.h
>> @@ -64,7 +64,11 @@ static inline void queue_pte_barriers(void)
>>   {
>>       unsigned long flags;
>>
>> -    VM_WARN_ON(in_interrupt());
>> +    if (in_interrupt()) {
>> +        emit_pte_barriers();
>> +        return;
>> +    }
>> +
>>       flags = read_thread_flags();
>>
>>       if (flags & BIT(TIF_LAZY_MMU)) {
>> @@ -79,7 +83,9 @@ static inline void queue_pte_barriers(void)
>>   #define  __HAVE_ARCH_ENTER_LAZY_MMU_MODE
>>   static inline void arch_enter_lazy_mmu_mode(void)
>>   {
>> -    VM_WARN_ON(in_interrupt());
>> +    if (in_interrupt())
>> +        return;
>> +
>>       VM_WARN_ON(test_thread_flag(TIF_LAZY_MMU));
>>
>>       set_thread_flag(TIF_LAZY_MMU);
>> @@ -87,12 +93,18 @@ static inline void arch_enter_lazy_mmu_mode(void)
>>
>>   static inline void arch_flush_lazy_mmu_mode(void)
>>   {
>> +    if (in_interrupt())
>> +        return;
>> +
>>       if (test_and_clear_thread_flag(TIF_LAZY_MMU_PENDING))
>>           emit_pte_barriers();
>>   }
>>
>>   static inline void arch_leave_lazy_mmu_mode(void)
>>   {
>> +    if (in_interrupt())
>> +        return;
>> +
>>       arch_flush_lazy_mmu_mode();
>>       clear_thread_flag(TIF_LAZY_MMU);
>>   }
> 
> I guess in all cases we could optimize out the in_interrupt() check on !debug
> configs.

I think that assumes we can easily and accurately identify all configs that
cause this? We've identified 2 but I'm not confident that it's a full list.
Also, KFENCE isn't really a debug config (despite me calling it that in the
commit log) - it's supposed to be something that can be enabled in production
builds.

> 
> Hm, maybe there is an elegant way to catch all of these "problematic" users?

I'm all ears if you have any suggestions? :)


It actaully looks like x86/XEN tries to solves this problem in a similar way:

enum xen_lazy_mode xen_get_lazy_mode(void)
{
	if (in_interrupt())
		return XEN_LAZY_NONE;

	return this_cpu_read(xen_lazy_mode);
}

Although I'm not convinced it's fully robust. It also has:

static inline void enter_lazy(enum xen_lazy_mode mode)
{
	BUG_ON(this_cpu_read(xen_lazy_mode) != XEN_LAZY_NONE);

	this_cpu_write(xen_lazy_mode, mode);
}

which is called as part of its arch_enter_lazy_mmu_mode() implementation. If a
task was already in lazy mmu mode when an interrupt comes in and causes the
nested arch_enter_lazy_mmu_mode() that we saw in this bug report, surely that
BUG_ON() should trigger?

Thanks,
Ryan