lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <cbabb814-3ca5-4213-9346-a7de28aac474@arm.com>
Date: Mon, 3 Nov 2025 10:07:12 +0000
From: Ryan Roberts <ryan.roberts@....com>
To: Yang Shi <yang@...amperecomputing.com>, Guenter Roeck <linux@...ck-us.net>
Cc: catalin.marinas@....com, will@...nel.org, akpm@...ux-foundation.org,
 david@...hat.com, lorenzo.stoakes@...cle.com, ardb@...nel.org,
 dev.jain@....com, scott@...amperecomputing.com, cl@...two.org,
 linux-arm-kernel@...ts.infradead.org, linux-kernel@...r.kernel.org,
 linux-mm@...ck.org, nd@....com
Subject: Re: [PATCH v8 3/5] arm64: mm: support large block mapping when
 rodata=full

On 03/11/2025 00:47, Yang Shi wrote:
> 
> 

[...]

>> @@ -723,6 +733,16 @@ int split_kernel_leaf_mapping(unsigned long start,
>> unsigned long end)
>>       if (!system_supports_bbml2_noabort())
>>           return 0;
>>   +    /*
>> +     * If the region is within a pte-mapped area, there is no need to try to
>> +     * split. Additionally, CONFIG_DEBUG_ALLOC and CONFIG_KFENCE may change
>> +     * permissions from softirq context so for those cases (which are always
>> +     * pte-mapped), we must not go any further because taking the mutex
>> +     * below may sleep.
>> +     */
>> +    if (force_pte_mapping() || is_kfence_address((void *)start))
> 
> IIUC this may break kfence late init? The kfence_late_init() allocates pages
> from buddy allocator, then protects them (setting them to invalid). But the
> protection requires split page table, this check will prevent kernel from
> splitting page table because __kfence_pool is initialized before doing
> protection. So there is kind of circular dependency.

I hadn't considered late init. But I guess the requirement is that the kfence
pool needs to be pte mapped whenever kfence is enabled.

For early init; that requirement is clearly met since we pte map it in the arch
code. For late init, as far as I can tell, the memory is initially block mapped,
is allocarted from the buddy then every other page is protected via
kfence_init_pool() from kfence_init_pool(). This will have the effect of
splitting every page in the pool to pte mappings (as long as your suggested fix
below is applied).

It all feels a bit accidental though.

> 
> The below fix may work?
> 
> if (force_pte_mapping() || (READ_ONCE(kfence_enabled) && is_kfence_address((void
> *)start)))
> 
> The kfence_enabled won't be set until protection is done. So if it is set, we
> know kfence address must be mapped by PTE.

I think it will work, but it feels a bit hacky, and kfence_enabled is currently
static in core.c.

I wonder if it would be preferable to explicitly do the pte mapping in
arch_kfence_init_pool()? It looks like that's how x86 does it...

> 
> Thanks,
> Yang
> 
> 
> 
> 
> 
>> +        return 0;
>> +
>>       /*
>>        * Ensure start and end are at least page-aligned since this is the
>>        * finest granularity we can split to.
>> @@ -1009,16 +1029,6 @@ static inline void arm64_kfence_map_pool(phys_addr_t
>> kfence_pool, pgd_t *pgdp) {
>>     #endif /* CONFIG_KFENCE */
>>   -static inline bool force_pte_mapping(void)
>> -{
>> -    bool bbml2 = system_capabilities_finalized() ?
>> -        system_supports_bbml2_noabort() : cpu_supports_bbml2_noabort();
>> -
>> -    return (!bbml2 && (rodata_full || arm64_kfence_can_set_direct_map() ||
>> -               is_realm_world())) ||
>> -        debug_pagealloc_enabled();
>> -}
>> -
>>   static void __init map_mem(pgd_t *pgdp)
>>   {
>>       static const u64 direct_map_end = _PAGE_END(VA_BITS_MIN);
>> ---8<---
>>
>> Thanks,
>> Ryan
>>
>>> Yang Shi, Do you have any additional thoughts?
>>>
>>> Thanks,
>>> Ryan
>>>
>>>> Thanks,
>>>> Guenter
>>>>
>>>> ---
>>>> Example log:
>>>>
>>>> [    0.537499] BUG: sleeping function called from invalid context at kernel/
>>>> locking/mutex.c:580
>>>> [    0.537501] in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 1,
>>>> name: swapper/0
>>>> [    0.537502] preempt_count: 1, expected: 0
>>>> [    0.537504] 2 locks held by swapper/0/1:
>>>> [    0.537505]  #0: ffffb60b01211960 (sched_domains_mutex){+.+.}-{4:4}, at:
>>>> sched_domains_mutex_lock+0x24/0x38
>>>> [    0.537510]  #1: ffffb60b01595838 (rcu_read_lock){....}-{1:3}, at:
>>>> rcu_lock_acquire+0x0/0x40
>>>> [    0.537516] CPU: 0 UID: 0 PID: 1 Comm: swapper/0 Not tainted 6.18.0-dbg-
>>>> DEV #1 NONE
>>>> [    0.537517] Call trace:
>>>> [    0.537518]  show_stack+0x20/0x38 (C)
>>>> [    0.537520]  __dump_stack+0x28/0x38
>>>> [    0.537522]  dump_stack_lvl+0xac/0xf0
>>>> [    0.537525]  dump_stack+0x18/0x3c
>>>> [    0.537527]  __might_resched+0x248/0x2a0
>>>> [    0.537529]  __might_sleep+0x40/0x90
>>>> [    0.537531]  __mutex_lock_common+0x70/0x1818
>>>> [    0.537533]  mutex_lock_nested+0x34/0x48
>>>> [    0.537534]  split_kernel_leaf_mapping+0x74/0x1a0
>>>> [    0.537536]  update_range_prot+0x40/0x150
>>>> [    0.537537]  __change_memory_common+0x30/0x148
>>>> [    0.537538]  __kernel_map_pages+0x70/0x88
>>>> [    0.537540]  __free_frozen_pages+0x6e4/0x7b8
>>>> [    0.537542]  free_frozen_pages+0x1c/0x30
>>>> [    0.537544]  __free_slab+0xf0/0x168
>>>> [    0.537547]  free_slab+0x2c/0xf8
>>>> [    0.537549]  free_to_partial_list+0x4e0/0x620
>>>> [    0.537551]  __slab_free+0x228/0x250
>>>> [    0.537553]  kfree+0x3c4/0x4c0
>>>> [    0.537555]  destroy_sched_domain+0xf8/0x140
>>>> [    0.537557]  cpu_attach_domain+0x17c/0x610
>>>> [    0.537558]  build_sched_domains+0x15a4/0x1718
>>>> [    0.537560]  sched_init_domains+0xbc/0xf8
>>>> [    0.537561]  sched_init_smp+0x30/0x98
>>>> [    0.537562]  kernel_init_freeable+0x148/0x230
>>>> [    0.537564]  kernel_init+0x28/0x148
>>>> [    0.537566]  ret_from_fork+0x10/0x20
>>>> [    0.537569] =============================
>>>> [    0.537569] [ BUG: Invalid wait context ]
>>>> [    0.537571] 6.18.0-dbg-DEV #1 Tainted: G        W
>>>> [    0.537572] -----------------------------
>>>> [    0.537572] swapper/0/1 is trying to lock:
>>>> [    0.537573] ffffb60b011f3830 (pgtable_split_lock){+.+.}-{4:4}, at:
>>>> split_kernel_leaf_mapping+0x74/0x1a0
>>>> [    0.537576] other info that might help us debug this:
>>>> [    0.537577] context-{5:5}
>>>> [    0.537578] 2 locks held by swapper/0/1:
>>>> [    0.537579]  #0: ffffb60b01211960 (sched_domains_mutex){+.+.}-{4:4}, at:
>>>> sched_domains_mutex_lock+0x24/0x38
>>>> [    0.537582]  #1: ffffb60b01595838 (rcu_read_lock){....}-{1:3}, at:
>>>> rcu_lock_acquire+0x0/0x40
>>>> [    0.537585] stack backtrace:
>>>> [    0.537585] CPU: 0 UID: 0 PID: 1 Comm: swapper/0 Tainted: G       
>>>> W           6.18.0-dbg-DEV #1 NONE
>>>> [    0.537587] Tainted: [W]=WARN
>>>> [    0.537588] Call trace:
>>>> [    0.537589]  show_stack+0x20/0x38 (C)
>>>> [    0.537591]  __dump_stack+0x28/0x38
>>>> [    0.537593]  dump_stack_lvl+0xac/0xf0
>>>> [    0.537596]  dump_stack+0x18/0x3c
>>>> [    0.537598]  __lock_acquire+0x980/0x2a20
>>>> [    0.537600]  lock_acquire+0x124/0x2b8
>>>> [    0.537602]  __mutex_lock_common+0xd8/0x1818
>>>> [    0.537604]  mutex_lock_nested+0x34/0x48
>>>> [    0.537605]  split_kernel_leaf_mapping+0x74/0x1a0
>>>> [    0.537607]  update_range_prot+0x40/0x150
>>>> [    0.537608]  __change_memory_common+0x30/0x148
>>>> [    0.537609]  __kernel_map_pages+0x70/0x88
>>>> [    0.537610]  __free_frozen_pages+0x6e4/0x7b8
>>>> [    0.537613]  free_frozen_pages+0x1c/0x30
>>>> [    0.537615]  __free_slab+0xf0/0x168
>>>> [    0.537617]  free_slab+0x2c/0xf8
>>>> [    0.537619]  free_to_partial_list+0x4e0/0x620
>>>> [    0.537621]  __slab_free+0x228/0x250
>>>> [    0.537623]  kfree+0x3c4/0x4c0
>>>> [    0.537625]  destroy_sched_domain+0xf8/0x140
>>>> [    0.537627]  cpu_attach_domain+0x17c/0x610
>>>> [    0.537628]  build_sched_domains+0x15a4/0x1718
>>>> [    0.537630]  sched_init_domains+0xbc/0xf8
>>>> [    0.537631]  sched_init_smp+0x30/0x98
>>>> [    0.537632]  kernel_init_freeable+0x148/0x230
>>>> [    0.537633]  kernel_init+0x28/0x148
>>>> [    0.537635]  ret_from_fork+0x10/0x20
>>>>
>>>> ---
>>>> bisect:
>>>>
>>>> # bad: [3a8660878839faadb4f1a6dd72c3179c1df56787] Linux 6.18-rc1
>>>> # good: [e5f0a698b34ed76002dc5cff3804a61c80233a7a] Linux 6.17
>>>> git bisect start 'v6.18-rc1' 'v6.17'
>>>> # bad: [58809f614e0e3f4e12b489bddf680bfeb31c0a20] Merge tag 'drm-
>>>> next-2025-10-01' of https://gitlab.freedesktop.org/drm/kernel
>>>> git bisect bad 58809f614e0e3f4e12b489bddf680bfeb31c0a20
>>>> # bad: [a8253f807760e9c80eada9e5354e1240ccf325f9] Merge tag 'soc-
>>>> newsoc-6.18' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc
>>>> git bisect bad a8253f807760e9c80eada9e5354e1240ccf325f9
>>>> # bad: [4b81e2eb9e4db8f6094c077d0c8b27c264901c1b] Merge tag 'timers-
>>>> vdso-2025-09-29' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
>>>> git bisect bad 4b81e2eb9e4db8f6094c077d0c8b27c264901c1b
>>>> # bad: [f1004b2f19d7e9add9d707f64d9fcbc50f67921b] Merge tag 'm68k-for-v6.18-
>>>> tag1' of git://git.kernel.org/pub/scm/linux/kernel/git/geert/linux-m68k
>>>> git bisect bad f1004b2f19d7e9add9d707f64d9fcbc50f67921b
>>>> # good: [a9401710a5f5681abd2a6f21f9e76bc9f2e81891] Merge tag 'v6.18-rc-
>>>> part1-smb3-common' of git://git.samba.org/ksmbd
>>>> git bisect good a9401710a5f5681abd2a6f21f9e76bc9f2e81891
>>>> # good: [fe68bb2861808ed5c48d399bd7e670ab76829d55] Merge tag 'microblaze-
>>>> v6.18' of git://git.monstr.eu/linux-2.6-microblaze
>>>> git bisect good fe68bb2861808ed5c48d399bd7e670ab76829d55
>>>> # bad: [f2d64a22faeeecff385b4c91fab5fe036ab00162] Merge branch 'for-next/
>>>> perf' into for-next/core
>>>> git bisect bad f2d64a22faeeecff385b4c91fab5fe036ab00162
>>>> # good: [30f9386820cddbba59b48ae0670c3a1646dd440e] Merge branch 'for-next/
>>>> misc' into for-next/core
>>>> git bisect good 30f9386820cddbba59b48ae0670c3a1646dd440e
>>>> # good: [43de0ac332b815cf56dbdce63687de9acfd35d49] drivers/perf: hisi: Relax
>>>> the event ID check in the framework
>>>> git bisect good 43de0ac332b815cf56dbdce63687de9acfd35d49
>>>> # good: [5973a62efa34c80c9a4e5eac1fca6f6209b902af] arm64: map [_text,
>>>> _stext) virtual address range non-executable+read-only
>>>> git bisect good 5973a62efa34c80c9a4e5eac1fca6f6209b902af
>>>> # good: [b3abb08d6f628a76c36bf7da9508e1a67bf186a0] drivers/perf: hisi:
>>>> Refactor the event configuration of L3C PMU
>>>> git bisect good b3abb08d6f628a76c36bf7da9508e1a67bf186a0
>>>> # good: [6d2f913fda5683fbd4c3580262e10386c1263dfb] Documentation: hisi-pmu:
>>>> Add introduction to HiSilicon V3 PMU
>>>> git bisect good 6d2f913fda5683fbd4c3580262e10386c1263dfb
>>>> # good: [2084660ad288c998b6f0c885e266deb364f65fba] perf/dwc_pcie: Fix use of
>>>> uninitialized variable
>>>> git bisect good 2084660ad288c998b6f0c885e266deb364f65fba
>>>> # bad: [77dfca70baefcb988318a72fe69eb99f6dabbbb1] Merge branch 'for-next/mm'
>>>> into for-next/core
>>>> git bisect bad 77dfca70baefcb988318a72fe69eb99f6dabbbb1
>>>> # first bad commit: [77dfca70baefcb988318a72fe69eb99f6dabbbb1] Merge branch
>>>> 'for-next/mm' into for-next/core
>>>>
>>>> ---
>>>> bisect into branch:
>>>>
>>>> - git checkout -b testing 77dfca70baefcb988318a72fe69eb99f6dabbbb1
>>>> - git rebase 77dfca70baefcb988318a72fe69eb99f6dabbbb1~1
>>>>    [ fix minor conflict similar to the conflict resolution in 77dfca70baefc]
>>>> - git diff 77dfca70baefcb988318a72fe69eb99f6dabbbb1
>>>>    [ confirmed that there are no differences ]
>>>> - confirm that the problem is still seen at the tip of the rebase
>>>> - git bisect start HEAD 77dfca70baefcb988318a72fe69eb99f6dabbbb1~1
>>>> - run bisect
>>>>
>>>> Results:
>>>>
>>>> # bad: [47fc25df1ae3ae8412f1b812fb586c714d04a5e6] arm64: map [_text, _stext)
>>>> virtual address range non-executable+read-only
>>>> # good: [30f9386820cddbba59b48ae0670c3a1646dd440e] Merge branch 'for-next/
>>>> misc' into for-next/core
>>>> git bisect start 'HEAD' '77dfca70baefcb988318a72fe69eb99f6dabbbb1~1'
>>>> # good: [805491d19fc21271b5c27f4602f8f66b625c110f] arm64/Kconfig: Remove
>>>> CONFIG_RODATA_FULL_DEFAULT_ENABLED
>>>> git bisect good 805491d19fc21271b5c27f4602f8f66b625c110f
>>>> # bad: [13c7d7426232cc4489df7cd2e1f646a22d3f6172] arm64: mm: support large
>>>> block mapping when rodata=full
>>>> git bisect bad 13c7d7426232cc4489df7cd2e1f646a22d3f6172
>>>> # good: [a4d9c67e503f2b73c2d89d8e8209dfd241bdc8d8] arm64: Enable permission
>>>> change on arm64 kernel block mappings
>>>> git bisect good a4d9c67e503f2b73c2d89d8e8209dfd241bdc8d8
>>>> # first bad commit: [13c7d7426232cc4489df7cd2e1f646a22d3f6172] arm64: mm:
>>>> support large block mapping when rodata=full
> 


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ