[<prev] [next>] [day] [month] [year] [list]
Message-ID: <8ec1a40a-61a6-d8f3-074d-6cc8697f261d@huawei.com>
Date: Wed, 15 Jan 2025 20:32:37 +0800
From: Kunkun Jiang <jiangkunkun@...wei.com>
To: Catalin Marinas <catalin.marinas@....com>, Will Deacon <will@...nel.org>,
Mark Rutland <mark.rutland@....com>, Jonathan Cameron
<Jonathan.Cameron@...wei.com>, Gavin Shan <gshan@...hat.com>, James Morse
<james.morse@....com>, Jean-Philippe Brucker <jean-philippe@...aro.org>,
Jinjie Ruan <ruanjinjie@...wei.com>, Douglas Anderson
<dianders@...omium.org>, Puranjay Mohan <puranjay@...nel.org>, Luchunhua
<luchunhua@...wei.com>
CC: "moderated list:ARM SMMU DRIVERS" <linux-arm-kernel@...ts.infradead.org>,
open list <linux-kernel@...r.kernel.org>, "wanghaibin.wang@...wei.com"
<wanghaibin.wang@...wei.com>, Zenghui Yu <yuzenghui@...wei.com>,
<wangzhou1@...ilicon.com>
Subject: [Question] Call trace occurs occasionally when a rollback is
performed upon CPU online timeout
Hi all,
I have a question about CPU online/offline. In the following test
scenario, various tasks(iperf,fio,sve,...) are executed in a VM with 6
vCPUs. At the same time, repeat online/offline operations on two of the
vCPUs through /sys/devices/system/cpu/cpuX/online. After running for
many hours,some calltrace will appear in the guest.
The first, WARN_ON_ONCE(test_bit(KTHREAD_SHOULD_PARK, &kthread->flags))
is triggered.
> Call trace:
> kthread_park+0xd0/0xdc
> takedown_cpu+0x4c/0x140
> cpuhp_invoke_callback+0x160/0x6e0
> _cpu_up+0x1a4/0x200
> cpu_up+0xbc/0x100
> cpu_device_up+0x20/0x30
> cpu_subsys_online+0x4c/0xb0
> device_online+0x7c/0xa0
> online_store+0xd0/0xe0
> dev_attr_store+0x20/0x34
> sysfs_kf_write+0x4c/0x5c
> kernfs_fop_write_iter+0x130/0x1c0
> new_sync_write+0xec/0x18c
> vfs_write+0x214/0x2ac
> ksys_write+0x70/0xfc
> __arm64_sys_write+0x24/0x30
> invoke_syscall+0x50/0x11c
> el0_svc_common.constprop.0+0x68/0x164
> do_el0_svc+0x34/0xcc
> el0_svc+0x20/0x30
> el0_sync_handler+0xb8/0xc0
> el0_sync+0x160/0x180
The second, BUG_ON(!irqs_disabled() && !IS_ENABLED(CONFIG_PREEMPT_RT))
is triggered.
> Call trace:
> irq_work_run_list+0x64/0x70
> smpcfd_dying_cpu+0x24/0x34
> cpuhp_invoke_callback+0x160/0x6e0
> _cpu_up+0x1a4/0x200
> cpu_up+0xbc/0x100
> cpu_device_up+0x20/0x30
> cpu_subsys_online+0x4c/0xb0
> device_online+0x7c/0xa0
> online_store+0xd0/0xe0
> dev_attr_store+0x20/0x34
> sysfs_kf_write+0x4c/0x5c
> kernfs_fop_write_iter+0x130/0x1c0
> new_sync_write+0xec/0x18c
> vfs_write+0x214/0x2ac
> ksys_write+0x70/0xfc
> __arm64_sys_write+0x24/0x30
> invoke_syscall+0x50/0x11c
> el0_svc_common.constprop.0+0x68/0x164
> do_el0_svc+0x34/0xcc
> el0_svc+0x20/0x30
> el0_sync_handler+0xb8/0xc0
> el0_sync+0x160/0x180
According to my analysis, the root cause of the question is because the
vCPU online times out, but in fact the vCPU was successfully online.
Rollback is performed due to timeout. During the rollback, the
secondary_start_kernel is still executing, resulting in the above call
trace. So is this a bug? If so, how should it be repaired?
The reason for the timeout has not been found. It is suspected that it
is caused by excessive task pressure. If you have other ideas, please
point them out.
Thanks,
Kunkun Jiang
Powered by blists - more mailing lists