lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <f47441af-4147-40df-b79a-2fff4a745eac@linux.dev>
Date: Tue, 30 Sep 2025 01:57:40 +0800
From: Wen Yang <wen.yang@...ux.dev>
To: Greg Kroah-Hartman <gregkh@...uxfoundation.org>
Cc: linux-kernel@...r.kernel.org, Pierre Gondois <pierre.gondois@....com>,
 Sudeep Holla <sudeep.holla@....com>, Palmer Dabbelt <palmer@...osinc.com>,
 stable@...r.kernel.org
Subject: Re: [PATCH 6.1] arch_topology: Build cacheinfo from primary CPU



On 9/29/25 21:21, Greg Kroah-Hartman wrote:
> On Sat, Sep 27, 2025 at 01:46:58AM +0800, Wen Yang wrote:
>> From: Pierre Gondois <pierre.gondois@....com>
>>
>> commit 5944ce092b97caed5d86d961e963b883b5c44ee2 upstream.
>>

>> adds a call to detect_cache_attributes() to populate the cacheinfo
>> before updating the siblings mask. detect_cache_attributes() allocates
>> memory and can take the PPTT mutex (on ACPI platforms). On PREEMPT_RT
>> kernels, on secondary CPUs, this triggers a:
>>    'BUG: sleeping function called from invalid context' [1]
>> as the code is executed with preemption and interrupts disabled.
>>
>> The primary CPU was previously storing the cache information using
>> the now removed (struct cpu_topology).llc_id:
>> commit 5b8dc787ce4a ("arch_topology: Drop LLC identifier stash from
>> the CPU topology")
>>
>> allocate_cache_info() tries to build the cacheinfo from the primary
>> CPU prior secondary CPUs boot, if the DT/ACPI description
>> contains cache information.
>> If allocate_cache_info() fails, then fallback to the current state
>> for the cacheinfo allocation. [1] will be triggered in such case.
>>
>> When unplugging a CPU, the cacheinfo memory cannot be freed. If it
>> was, then the memory would be allocated early by the re-plugged
>> CPU and would trigger [1].
>>
>> Note that populate_cache_leaves() might be called multiple times
>> due to populate_leaves being moved up. This is required since
>> detect_cache_attributes() might be called with per_cpu_cacheinfo(cpu)
>> being allocated but not populated.
>>
>> [1]:
>>   | BUG: sleeping function called from invalid context at kernel/locking/spinlock_rt.c:46
>>   | in_atomic(): 1, irqs_disabled(): 128, non_block: 0, pid: 0, name: swapper/111
>>   | preempt_count: 1, expected: 0
>>   | RCU nest depth: 1, expected: 1
>>   | 3 locks held by swapper/111/0:
>>   |  #0:  (&pcp->lock){+.+.}-{3:3}, at: get_page_from_freelist+0x218/0x12c8
>>   |  #1:  (rcu_read_lock){....}-{1:3}, at: rt_spin_trylock+0x48/0xf0
>>   |  #2:  (&zone->lock){+.+.}-{3:3}, at: rmqueue_bulk+0x64/0xa80
>>   | irq event stamp: 0
>>   | hardirqs last  enabled at (0):  0x0
>>   | hardirqs last disabled at (0):  copy_process+0x5dc/0x1ab8
>>   | softirqs last  enabled at (0):  copy_process+0x5dc/0x1ab8
>>   | softirqs last disabled at (0):  0x0
>>   | Preemption disabled at:
>>   |  migrate_enable+0x30/0x130
>>   | CPU: 111 PID: 0 Comm: swapper/111 Tainted: G        W          6.0.0-rc4-rt6-[...]
>>   | Call trace:
>>   |  __kmalloc+0xbc/0x1e8
>>   |  detect_cache_attributes+0x2d4/0x5f0
>>   |  update_siblings_masks+0x30/0x368
>>   |  store_cpu_topology+0x78/0xb8
>>   |  secondary_start_kernel+0xd0/0x198
>>   |  __secondary_switched+0xb0/0xb4
>>
>> Signed-off-by: Pierre Gondois <pierre.gondois@....com>
>> Reviewed-by: Sudeep Holla <sudeep.holla@....com>
>> Acked-by: Palmer Dabbelt <palmer@...osinc.com>
>> Link: https://lore.kernel.org/r/20230104183033.755668-7-pierre.gondois@arm.com
>> Signed-off-by: Sudeep Holla <sudeep.holla@....com>
>> Cc: <stable@...r.kernel.org> # 6.1.x: c3719bd:cacheinfo: Use RISC-V's init_cache_level() as generic OF implementation
>> Cc: <stable@...r.kernel.org> # 6.1.x: 8844c3d:cacheinfo: Return error code in init_of_cache_level(
>> Cc: <stable@...r.kernel.org> # 6.1.x: de0df44:cacheinfo: Check 'cache-unified' property to count cache leaves
>> Cc: <stable@...r.kernel.org> # 6.1.x: fa4d566:ACPI: PPTT: Remove acpi_find_cache_levels()
>> Cc: <stable@...r.kernel.org> # 6.1.x: bd50036:ACPI: PPTT: Update acpi_find_last_cache_level() to acpi_get_cache_info(
>> Cc: <stable@...r.kernel.org> # 6.1.x
> 
> I do not understand, why do you want all of these applied as well?  Can
> you just send the full series of commits?
> 
Thanks for your comments, here is the original series:
https://lore.kernel.org/all/167404285593.885445.6219705651301997538.b4-ty@arm.com/

commit 3fcbf1c77d08 ("arch_topology: Fix cache attributes detection in 
the CPU hotplug path") introduced a bug, and this series fixed it.

>> Signed-off-by: Wen Yang <wen.yang@...ux.dev>
> 
> Also, you have changed this commit a lot from the original one, please
> document what you did here.
> 
Thanks for the reminder. We just hope to cherry-pick them onto the 6.1 
stable branch, without modifying the original commit.
Also checked again, as follows:

$ git cherry-pick c3719bd
$ git cherry-pick 8844c3d
$ git cherry-pick de0df44
$ git cherry-pick fa4d566
$ git cherry-pick bd50036
$ git cherry-pick 5944ce0

$ git format-patch HEAD -1

$ diff 0001-arch_topology-Build-cacheinfo-from-primary-CPU.patch 
20250927_wen_yang_arch_topology_build_cacheinfo_from_primary_cpu.mbx

Consistent with the original commit.

> Also, why not just use 6.6.y instead?  What is forcing you to use 6.1.y
> for this platform?  What caused this issue to just show up now?
> 

Thank you for your suggestion. But our production environment has been 
using 6.1.y-rt for quite some time now, so we can only gradually migrate 
to 6.6.y.
Perhaps some recently added loads related to power on/off have made it 
easier for this bug to be exposed.
Also hope that the upstream 6.1.y branch could fix it.

--
Best wishes,
Wen





Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ