[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <4bc562ea-2fba-4484-9548-c606e254bc00@arm.com>
Date: Mon, 3 Nov 2025 11:23:38 +0530
From: Dev Jain <dev.jain@....com>
To: Ryan Roberts <ryan.roberts@....com>, Guenter Roeck <linux@...ck-us.net>,
Yang Shi <yang@...amperecomputing.com>
Cc: catalin.marinas@....com, will@...nel.org, akpm@...ux-foundation.org,
david@...hat.com, lorenzo.stoakes@...cle.com, ardb@...nel.org,
scott@...amperecomputing.com, cl@...two.org,
linux-arm-kernel@...ts.infradead.org, linux-kernel@...r.kernel.org,
linux-mm@...ck.org, nd@....com
Subject: Re: [PATCH v8 3/5] arm64: mm: support large block mapping when
rodata=full
>>>>
>>> With lock debugging enabled, we see a large number of "BUG: sleeping
>>> function called from invalid context at kernel/locking/mutex.c:580"
>>> and "BUG: Invalid wait context:" backtraces when running v6.18-rc3.
>>> Please see example below.
>>>
>>> Bisect points to this patch.
>>>
>>> Please let me know if there is anything I can do to help tracking
>>> down the problem.
>> Thanks for the report - ouch!
>>
>> I expect you're running on a system that supports BBML2_NOABORT, based on the
>> stack trace, I expect you have CONFIG_DEBUG_PAGEALLOC enabled? That will cause
>> permission tricks to be played on the linear map at page allocation and free
>> time, which can happen in non-sleepable contexts. And with this patch we are
>> taking pgtable_split_lock (a mutex) in split_kernel_leaf_mapping(), which is
>> called as a result of the permission change request.
>>
>> However, when CONFIG_DEBUG_PAGEALLOC enabled we always force-map the linear map
>> by PTE so split_kernel_leaf_mapping() is actually unneccessary and will return
>> without actually having to split anything. So we could add an early "if
>> (force_pte_mapping()) return 0;" to bypass the function entirely in this case,
>> and I *think* that should solve it.
>>
>> But I'm also concerned about KFENCE. I can't remember it's exact semantics off
>> the top of my head, so I'm concerned we could see similar problems there (where
>> we only force pte mapping for the KFENCE pool).
>>
>> I'll investigate fully tomorrow and hopefully provide a fix.
> Here's a proposed fix, although I can't get access to a system with BBML2 until
> tomorrow at the earliest. Guenter, I wonder if you could check that this
> resolves your issue?
>
> ---8<---
> commit 602ec2db74e5abfb058bd03934475ead8558eb72
> Author: Ryan Roberts <ryan.roberts@....com>
> Date: Sun Nov 2 11:45:18 2025 +0000
>
> arm64: mm: Don't attempt to split known pte-mapped regions
>
> It has been reported that split_kernel_leaf_mapping() is trying to sleep
> in non-sleepable context. It does this when acquiring the
> pgtable_split_lock mutex, when either CONFIG_DEBUG_ALLOC or
> CONFIG_KFENCE are enabled, which change linear map permissions within
> softirq context during memory allocation and/or freeing.
>
> But it turns out that the memory for which these features may attempt to
> modify the permissions is always mapped by pte, so there is no need to
> attempt to split the mapping. So let's exit early in these cases and
> avoid attempting to take the mutex.
>
> Closes: https://lore.kernel.org/all/f24b9032-0ec9-47b1-8b95-c0eeac7a31c5@roeck-us.net/
> Fixes: a166563e7ec3 ("arm64: mm: support large block mapping when rodata=full")
> Signed-off-by: Ryan Roberts <ryan.roberts@....com>
>
> diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
> index b8d37eb037fc..6e26f070bb49 100644
> --- a/arch/arm64/mm/mmu.c
> +++ b/arch/arm64/mm/mmu.c
> @@ -708,6 +708,16 @@ static int split_kernel_leaf_mapping_locked(unsigned long addr)
> return ret;
> }
>
> +static inline bool force_pte_mapping(void)
> +{
> + bool bbml2 = system_capabilities_finalized() ?
> + system_supports_bbml2_noabort() : cpu_supports_bbml2_noabort();
> +
> + return (!bbml2 && (rodata_full || arm64_kfence_can_set_direct_map() ||
> + is_realm_world())) ||
> + debug_pagealloc_enabled();
> +}
> +
> static DEFINE_MUTEX(pgtable_split_lock);
>
> int split_kernel_leaf_mapping(unsigned long start, unsigned long end)
> @@ -723,6 +733,16 @@ int split_kernel_leaf_mapping(unsigned long start, unsigned long end)
> if (!system_supports_bbml2_noabort())
> return 0;
>
> + /*
> + * If the region is within a pte-mapped area, there is no need to try to
> + * split. Additionally, CONFIG_DEBUG_ALLOC and CONFIG_KFENCE may change
Nit: CONFIG_DEBUG_PAGEALLOC.
> + * permissions from softirq context so for those cases (which are always
> + * pte-mapped), we must not go any further because taking the mutex
> + * below may sleep.
> + */
> + if (force_pte_mapping() || is_kfence_address((void *)start))
> + return 0;
> +
> /*
> * Ensure start and end are at least page-aligned since this is the
> * finest granularity we can split to.
> @@ -1009,16 +1029,6 @@ static inline void arm64_kfence_map_pool(phys_addr_t kfence_pool, pgd_t *pgdp) {
>
> #endif /* CONFIG_KFENCE */
>
> -static inline bool force_pte_mapping(void)
> -{
> - bool bbml2 = system_capabilities_finalized() ?
> - system_supports_bbml2_noabort() : cpu_supports_bbml2_noabort();
> -
> - return (!bbml2 && (rodata_full || arm64_kfence_can_set_direct_map() ||
> - is_realm_world())) ||
> - debug_pagealloc_enabled();
> -}
> -
Otherwise LGTM.
Reviewed-by: Dev Jain <dev.jain@....com>
> static void __init map_mem(pgd_t *pgdp)
> {
> static const u64 direct_map_end = _PAGE_END(VA_BITS_MIN);
> ---8<---
>
> Thanks,
> Ryan
>
>> Yang Shi, Do you have any additional thoughts?
>>
>> Thanks,
>> Ryan
>>
Powered by blists - more mailing lists