[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <cbabb814-3ca5-4213-9346-a7de28aac474@arm.com>
Date: Mon, 3 Nov 2025 10:07:12 +0000
From: Ryan Roberts <ryan.roberts@....com>
To: Yang Shi <yang@...amperecomputing.com>, Guenter Roeck <linux@...ck-us.net>
Cc: catalin.marinas@....com, will@...nel.org, akpm@...ux-foundation.org,
david@...hat.com, lorenzo.stoakes@...cle.com, ardb@...nel.org,
dev.jain@....com, scott@...amperecomputing.com, cl@...two.org,
linux-arm-kernel@...ts.infradead.org, linux-kernel@...r.kernel.org,
linux-mm@...ck.org, nd@....com
Subject: Re: [PATCH v8 3/5] arm64: mm: support large block mapping when
rodata=full
On 03/11/2025 00:47, Yang Shi wrote:
>
>
[...]
>> @@ -723,6 +733,16 @@ int split_kernel_leaf_mapping(unsigned long start,
>> unsigned long end)
>> if (!system_supports_bbml2_noabort())
>> return 0;
>> + /*
>> + * If the region is within a pte-mapped area, there is no need to try to
>> + * split. Additionally, CONFIG_DEBUG_ALLOC and CONFIG_KFENCE may change
>> + * permissions from softirq context so for those cases (which are always
>> + * pte-mapped), we must not go any further because taking the mutex
>> + * below may sleep.
>> + */
>> + if (force_pte_mapping() || is_kfence_address((void *)start))
>
> IIUC this may break kfence late init? The kfence_late_init() allocates pages
> from buddy allocator, then protects them (setting them to invalid). But the
> protection requires split page table, this check will prevent kernel from
> splitting page table because __kfence_pool is initialized before doing
> protection. So there is kind of circular dependency.
I hadn't considered late init. But I guess the requirement is that the kfence
pool needs to be pte mapped whenever kfence is enabled.
For early init; that requirement is clearly met since we pte map it in the arch
code. For late init, as far as I can tell, the memory is initially block mapped,
is allocarted from the buddy then every other page is protected via
kfence_init_pool() from kfence_init_pool(). This will have the effect of
splitting every page in the pool to pte mappings (as long as your suggested fix
below is applied).
It all feels a bit accidental though.
>
> The below fix may work?
>
> if (force_pte_mapping() || (READ_ONCE(kfence_enabled) && is_kfence_address((void
> *)start)))
>
> The kfence_enabled won't be set until protection is done. So if it is set, we
> know kfence address must be mapped by PTE.
I think it will work, but it feels a bit hacky, and kfence_enabled is currently
static in core.c.
I wonder if it would be preferable to explicitly do the pte mapping in
arch_kfence_init_pool()? It looks like that's how x86 does it...
>
> Thanks,
> Yang
>
>
>
>
>
>> + return 0;
>> +
>> /*
>> * Ensure start and end are at least page-aligned since this is the
>> * finest granularity we can split to.
>> @@ -1009,16 +1029,6 @@ static inline void arm64_kfence_map_pool(phys_addr_t
>> kfence_pool, pgd_t *pgdp) {
>> #endif /* CONFIG_KFENCE */
>> -static inline bool force_pte_mapping(void)
>> -{
>> - bool bbml2 = system_capabilities_finalized() ?
>> - system_supports_bbml2_noabort() : cpu_supports_bbml2_noabort();
>> -
>> - return (!bbml2 && (rodata_full || arm64_kfence_can_set_direct_map() ||
>> - is_realm_world())) ||
>> - debug_pagealloc_enabled();
>> -}
>> -
>> static void __init map_mem(pgd_t *pgdp)
>> {
>> static const u64 direct_map_end = _PAGE_END(VA_BITS_MIN);
>> ---8<---
>>
>> Thanks,
>> Ryan
>>
>>> Yang Shi, Do you have any additional thoughts?
>>>
>>> Thanks,
>>> Ryan
>>>
>>>> Thanks,
>>>> Guenter
>>>>
>>>> ---
>>>> Example log:
>>>>
>>>> [ 0.537499] BUG: sleeping function called from invalid context at kernel/
>>>> locking/mutex.c:580
>>>> [ 0.537501] in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 1,
>>>> name: swapper/0
>>>> [ 0.537502] preempt_count: 1, expected: 0
>>>> [ 0.537504] 2 locks held by swapper/0/1:
>>>> [ 0.537505] #0: ffffb60b01211960 (sched_domains_mutex){+.+.}-{4:4}, at:
>>>> sched_domains_mutex_lock+0x24/0x38
>>>> [ 0.537510] #1: ffffb60b01595838 (rcu_read_lock){....}-{1:3}, at:
>>>> rcu_lock_acquire+0x0/0x40
>>>> [ 0.537516] CPU: 0 UID: 0 PID: 1 Comm: swapper/0 Not tainted 6.18.0-dbg-
>>>> DEV #1 NONE
>>>> [ 0.537517] Call trace:
>>>> [ 0.537518] show_stack+0x20/0x38 (C)
>>>> [ 0.537520] __dump_stack+0x28/0x38
>>>> [ 0.537522] dump_stack_lvl+0xac/0xf0
>>>> [ 0.537525] dump_stack+0x18/0x3c
>>>> [ 0.537527] __might_resched+0x248/0x2a0
>>>> [ 0.537529] __might_sleep+0x40/0x90
>>>> [ 0.537531] __mutex_lock_common+0x70/0x1818
>>>> [ 0.537533] mutex_lock_nested+0x34/0x48
>>>> [ 0.537534] split_kernel_leaf_mapping+0x74/0x1a0
>>>> [ 0.537536] update_range_prot+0x40/0x150
>>>> [ 0.537537] __change_memory_common+0x30/0x148
>>>> [ 0.537538] __kernel_map_pages+0x70/0x88
>>>> [ 0.537540] __free_frozen_pages+0x6e4/0x7b8
>>>> [ 0.537542] free_frozen_pages+0x1c/0x30
>>>> [ 0.537544] __free_slab+0xf0/0x168
>>>> [ 0.537547] free_slab+0x2c/0xf8
>>>> [ 0.537549] free_to_partial_list+0x4e0/0x620
>>>> [ 0.537551] __slab_free+0x228/0x250
>>>> [ 0.537553] kfree+0x3c4/0x4c0
>>>> [ 0.537555] destroy_sched_domain+0xf8/0x140
>>>> [ 0.537557] cpu_attach_domain+0x17c/0x610
>>>> [ 0.537558] build_sched_domains+0x15a4/0x1718
>>>> [ 0.537560] sched_init_domains+0xbc/0xf8
>>>> [ 0.537561] sched_init_smp+0x30/0x98
>>>> [ 0.537562] kernel_init_freeable+0x148/0x230
>>>> [ 0.537564] kernel_init+0x28/0x148
>>>> [ 0.537566] ret_from_fork+0x10/0x20
>>>> [ 0.537569] =============================
>>>> [ 0.537569] [ BUG: Invalid wait context ]
>>>> [ 0.537571] 6.18.0-dbg-DEV #1 Tainted: G W
>>>> [ 0.537572] -----------------------------
>>>> [ 0.537572] swapper/0/1 is trying to lock:
>>>> [ 0.537573] ffffb60b011f3830 (pgtable_split_lock){+.+.}-{4:4}, at:
>>>> split_kernel_leaf_mapping+0x74/0x1a0
>>>> [ 0.537576] other info that might help us debug this:
>>>> [ 0.537577] context-{5:5}
>>>> [ 0.537578] 2 locks held by swapper/0/1:
>>>> [ 0.537579] #0: ffffb60b01211960 (sched_domains_mutex){+.+.}-{4:4}, at:
>>>> sched_domains_mutex_lock+0x24/0x38
>>>> [ 0.537582] #1: ffffb60b01595838 (rcu_read_lock){....}-{1:3}, at:
>>>> rcu_lock_acquire+0x0/0x40
>>>> [ 0.537585] stack backtrace:
>>>> [ 0.537585] CPU: 0 UID: 0 PID: 1 Comm: swapper/0 Tainted: G
>>>> W 6.18.0-dbg-DEV #1 NONE
>>>> [ 0.537587] Tainted: [W]=WARN
>>>> [ 0.537588] Call trace:
>>>> [ 0.537589] show_stack+0x20/0x38 (C)
>>>> [ 0.537591] __dump_stack+0x28/0x38
>>>> [ 0.537593] dump_stack_lvl+0xac/0xf0
>>>> [ 0.537596] dump_stack+0x18/0x3c
>>>> [ 0.537598] __lock_acquire+0x980/0x2a20
>>>> [ 0.537600] lock_acquire+0x124/0x2b8
>>>> [ 0.537602] __mutex_lock_common+0xd8/0x1818
>>>> [ 0.537604] mutex_lock_nested+0x34/0x48
>>>> [ 0.537605] split_kernel_leaf_mapping+0x74/0x1a0
>>>> [ 0.537607] update_range_prot+0x40/0x150
>>>> [ 0.537608] __change_memory_common+0x30/0x148
>>>> [ 0.537609] __kernel_map_pages+0x70/0x88
>>>> [ 0.537610] __free_frozen_pages+0x6e4/0x7b8
>>>> [ 0.537613] free_frozen_pages+0x1c/0x30
>>>> [ 0.537615] __free_slab+0xf0/0x168
>>>> [ 0.537617] free_slab+0x2c/0xf8
>>>> [ 0.537619] free_to_partial_list+0x4e0/0x620
>>>> [ 0.537621] __slab_free+0x228/0x250
>>>> [ 0.537623] kfree+0x3c4/0x4c0
>>>> [ 0.537625] destroy_sched_domain+0xf8/0x140
>>>> [ 0.537627] cpu_attach_domain+0x17c/0x610
>>>> [ 0.537628] build_sched_domains+0x15a4/0x1718
>>>> [ 0.537630] sched_init_domains+0xbc/0xf8
>>>> [ 0.537631] sched_init_smp+0x30/0x98
>>>> [ 0.537632] kernel_init_freeable+0x148/0x230
>>>> [ 0.537633] kernel_init+0x28/0x148
>>>> [ 0.537635] ret_from_fork+0x10/0x20
>>>>
>>>> ---
>>>> bisect:
>>>>
>>>> # bad: [3a8660878839faadb4f1a6dd72c3179c1df56787] Linux 6.18-rc1
>>>> # good: [e5f0a698b34ed76002dc5cff3804a61c80233a7a] Linux 6.17
>>>> git bisect start 'v6.18-rc1' 'v6.17'
>>>> # bad: [58809f614e0e3f4e12b489bddf680bfeb31c0a20] Merge tag 'drm-
>>>> next-2025-10-01' of https://gitlab.freedesktop.org/drm/kernel
>>>> git bisect bad 58809f614e0e3f4e12b489bddf680bfeb31c0a20
>>>> # bad: [a8253f807760e9c80eada9e5354e1240ccf325f9] Merge tag 'soc-
>>>> newsoc-6.18' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc
>>>> git bisect bad a8253f807760e9c80eada9e5354e1240ccf325f9
>>>> # bad: [4b81e2eb9e4db8f6094c077d0c8b27c264901c1b] Merge tag 'timers-
>>>> vdso-2025-09-29' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
>>>> git bisect bad 4b81e2eb9e4db8f6094c077d0c8b27c264901c1b
>>>> # bad: [f1004b2f19d7e9add9d707f64d9fcbc50f67921b] Merge tag 'm68k-for-v6.18-
>>>> tag1' of git://git.kernel.org/pub/scm/linux/kernel/git/geert/linux-m68k
>>>> git bisect bad f1004b2f19d7e9add9d707f64d9fcbc50f67921b
>>>> # good: [a9401710a5f5681abd2a6f21f9e76bc9f2e81891] Merge tag 'v6.18-rc-
>>>> part1-smb3-common' of git://git.samba.org/ksmbd
>>>> git bisect good a9401710a5f5681abd2a6f21f9e76bc9f2e81891
>>>> # good: [fe68bb2861808ed5c48d399bd7e670ab76829d55] Merge tag 'microblaze-
>>>> v6.18' of git://git.monstr.eu/linux-2.6-microblaze
>>>> git bisect good fe68bb2861808ed5c48d399bd7e670ab76829d55
>>>> # bad: [f2d64a22faeeecff385b4c91fab5fe036ab00162] Merge branch 'for-next/
>>>> perf' into for-next/core
>>>> git bisect bad f2d64a22faeeecff385b4c91fab5fe036ab00162
>>>> # good: [30f9386820cddbba59b48ae0670c3a1646dd440e] Merge branch 'for-next/
>>>> misc' into for-next/core
>>>> git bisect good 30f9386820cddbba59b48ae0670c3a1646dd440e
>>>> # good: [43de0ac332b815cf56dbdce63687de9acfd35d49] drivers/perf: hisi: Relax
>>>> the event ID check in the framework
>>>> git bisect good 43de0ac332b815cf56dbdce63687de9acfd35d49
>>>> # good: [5973a62efa34c80c9a4e5eac1fca6f6209b902af] arm64: map [_text,
>>>> _stext) virtual address range non-executable+read-only
>>>> git bisect good 5973a62efa34c80c9a4e5eac1fca6f6209b902af
>>>> # good: [b3abb08d6f628a76c36bf7da9508e1a67bf186a0] drivers/perf: hisi:
>>>> Refactor the event configuration of L3C PMU
>>>> git bisect good b3abb08d6f628a76c36bf7da9508e1a67bf186a0
>>>> # good: [6d2f913fda5683fbd4c3580262e10386c1263dfb] Documentation: hisi-pmu:
>>>> Add introduction to HiSilicon V3 PMU
>>>> git bisect good 6d2f913fda5683fbd4c3580262e10386c1263dfb
>>>> # good: [2084660ad288c998b6f0c885e266deb364f65fba] perf/dwc_pcie: Fix use of
>>>> uninitialized variable
>>>> git bisect good 2084660ad288c998b6f0c885e266deb364f65fba
>>>> # bad: [77dfca70baefcb988318a72fe69eb99f6dabbbb1] Merge branch 'for-next/mm'
>>>> into for-next/core
>>>> git bisect bad 77dfca70baefcb988318a72fe69eb99f6dabbbb1
>>>> # first bad commit: [77dfca70baefcb988318a72fe69eb99f6dabbbb1] Merge branch
>>>> 'for-next/mm' into for-next/core
>>>>
>>>> ---
>>>> bisect into branch:
>>>>
>>>> - git checkout -b testing 77dfca70baefcb988318a72fe69eb99f6dabbbb1
>>>> - git rebase 77dfca70baefcb988318a72fe69eb99f6dabbbb1~1
>>>> [ fix minor conflict similar to the conflict resolution in 77dfca70baefc]
>>>> - git diff 77dfca70baefcb988318a72fe69eb99f6dabbbb1
>>>> [ confirmed that there are no differences ]
>>>> - confirm that the problem is still seen at the tip of the rebase
>>>> - git bisect start HEAD 77dfca70baefcb988318a72fe69eb99f6dabbbb1~1
>>>> - run bisect
>>>>
>>>> Results:
>>>>
>>>> # bad: [47fc25df1ae3ae8412f1b812fb586c714d04a5e6] arm64: map [_text, _stext)
>>>> virtual address range non-executable+read-only
>>>> # good: [30f9386820cddbba59b48ae0670c3a1646dd440e] Merge branch 'for-next/
>>>> misc' into for-next/core
>>>> git bisect start 'HEAD' '77dfca70baefcb988318a72fe69eb99f6dabbbb1~1'
>>>> # good: [805491d19fc21271b5c27f4602f8f66b625c110f] arm64/Kconfig: Remove
>>>> CONFIG_RODATA_FULL_DEFAULT_ENABLED
>>>> git bisect good 805491d19fc21271b5c27f4602f8f66b625c110f
>>>> # bad: [13c7d7426232cc4489df7cd2e1f646a22d3f6172] arm64: mm: support large
>>>> block mapping when rodata=full
>>>> git bisect bad 13c7d7426232cc4489df7cd2e1f646a22d3f6172
>>>> # good: [a4d9c67e503f2b73c2d89d8e8209dfd241bdc8d8] arm64: Enable permission
>>>> change on arm64 kernel block mappings
>>>> git bisect good a4d9c67e503f2b73c2d89d8e8209dfd241bdc8d8
>>>> # first bad commit: [13c7d7426232cc4489df7cd2e1f646a22d3f6172] arm64: mm:
>>>> support large block mapping when rodata=full
>
Powered by blists - more mailing lists