[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <8a266076-b3dc-4a39-aac4-089e2ef77da3@gmx.de>
Date: Thu, 1 Feb 2024 17:41:10 +0100
From: Helge Deller <deller@....de>
To: Tejun Heo <tj@...nel.org>, Helge Deller <deller@...nel.org>
Cc: Lai Jiangshan <jiangshanlai@...il.com>, linux-kernel@...r.kernel.org,
 linux-parisc@...r.kernel.org
Subject: Re: [PATCH][RFC] workqueue: Fix kernel panic on CPU hot-unplug
On 1/31/24 23:28, Tejun Heo wrote:
> On Wed, Jan 31, 2024 at 08:27:45PM +0100, Helge Deller wrote:
>> When hot-unplugging a 32-bit CPU on the parisc platform with
>> "chcpu -d 1", I get the following kernel panic. Adding a check
>> for !pwq prevents the panic.
>>
>>   Kernel Fault: Code=26 (Data memory access rights trap) at addr 00000000
>>   CPU: 1 PID: 21 Comm: cpuhp/1 Not tainted 6.8.0-rc1-32bit+ #1291
>>   Hardware name: 9000/778/B160L
>>
>>   IASQ: 00000000 00000000 IAOQ: 10446db4 10446db8
>>    IIR: 0f80109c    ISR: 00000000  IOR: 00000000
>>    CPU:        1   CR30: 11dd1710 CR31: 00000000
>>    IAOQ[0]: wq_update_pod+0x98/0x14c
>>    IAOQ[1]: wq_update_pod+0x9c/0x14c
>>    RP(r2): wq_update_pod+0x80/0x14c
>>   Backtrace:
>>    [<10448744>] workqueue_offline_cpu+0x1d4/0x1dc
>>    [<10429db4>] cpuhp_invoke_callback+0xf8/0x200
>>    [<1042a1d0>] cpuhp_thread_fun+0xb8/0x164
>>    [<10452970>] smpboot_thread_fn+0x284/0x288
>>    [<1044d8f4>] kthread+0x12c/0x13c
>>    [<1040201c>] ret_from_kernel_thread+0x1c/0x24
>>   Kernel panic - not syncing: Kernel Fault
>>
>> Signed-off-by: Helge Deller <deller@....de>
>>
>> ---
>>
>> diff --git a/kernel/workqueue.c b/kernel/workqueue.c
>> index 76e60faed892..dfeee7b7322c 100644
>> --- a/kernel/workqueue.c
>> +++ b/kernel/workqueue.c
>> @@ -4521,6 +4521,8 @@ static void wq_update_pod(struct workqueue_struct *wq, int cpu,
>>   	wq_calc_pod_cpumask(target_attrs, cpu, off_cpu);
>>   	pwq = rcu_dereference_protected(*per_cpu_ptr(wq->cpu_pwq, cpu),
>>   					lockdep_is_held(&wq_pool_mutex));
>> +	if (!pwq)
>> +		return;
>
> Hmm... I have a hard time imagining a scenario where some CPUs don't have
> pwq installed on wq->cpu_pwq. Can you please run `drgn
> tools/workqueue/wq_dump.py` before triggering the hotplug event and paste
> the output along with full dmesg?
I'm not sure if parisc is already fully supported with that tool, or
if I'm doing something wrong:
root@...ian:~# uname -a
Linux debian 6.8.0-rc1-32bit+ #1292 SMP PREEMPT Thu Feb  1 11:31:38 CET 2024 parisc GNU/Linux
root@...ian:~# drgn --main-symbols -s ./vmlinux ./wq_dump.py
Traceback (most recent call last):
   File "/usr/bin/drgn", line 33, in <module>
     sys.exit(load_entry_point('drgn==0.0.25', 'console_scripts', 'drgn')())
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
   File "/usr/lib/python3/dist-packages/drgn/cli.py", line 301, in _main
     runpy.run_path(script, init_globals={"prog": prog}, run_name="__main__")
   File "<frozen runpy>", line 291, in run_path
   File "<frozen runpy>", line 98, in _run_module_code
   File "<frozen runpy>", line 88, in _run_code
   File "./wq_dump.py", line 78, in <module>
     worker_pool_idr         = prog['worker_pool_idr']
                               ~~~~^^^^^^^^^^^^^^^^^^^
KeyError: 'worker_pool_idr'
Maybe you have an idea? I'll check further, but otherwise it's probably
easier for me to add some printk() to the kernel function wq_update_pod()
and send that info?
Helge
Powered by blists - more mailing lists
 
