[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <YrwokiIVX9E9kQIh@MiWiFi-R3L-srv>
Date: Wed, 29 Jun 2022 18:25:22 +0800
From: Baoquan He <bhe@...hat.com>
To: jstultz@...gle.com, tglx@...utronix.de, sboyd@...nel.org
Cc: linux-kernel@...r.kernel.org, x86@...nel.org,
kexec@...ts.infradead.org
Subject: Unstable tsc caused soft lockup in kdump kernel
Hi,
On a HP machine, after crash triggered via sysrq-c, kdump kernel will
boot and get soft lockup as below. And this can be always reproduced.
>From log, it seems that unreliable tsc was marked as unstable in
clocksource_watchdog, then worker sched_clock_work was scheduled. And
this tsc unstable marking always happened after sysrq-c is triggered.
And the cpu where worker smp_call_function_many_cond is running won't
be stopped and hang there and keep locks, even though the cpu should be
stopped. While kdump kernel is running in a different cpu and boot, then
soft lockup happened because other workers or the relevant threads are
waiting for locks taken by the hang sched_clock_work worker.
Any idea or suggestion?
The normal kernel boot log and kdump kernel log, kernel config, are all
attached, please check.
Thanks
Baoquan
=====
[ 5173.871886] watchdog: BUG: soft lockup - CPU#43 stuck for 52s! [kworker/43:2:1375]
[ 5173.871886] Modules linked in: qrtr ......
[ 5173.871886] irq event stamp: 120854
[ 5173.871886] hardirqs last enabled at (120853): [<ffffffffa1a00d0b>] asm_sysvec_apic_timer_interrupt+0x1b/0x20
[ 5173.871886] hardirqs last disabled at (120854): [<ffffffffa19a975b>] sysvec_apic_timer_interrupt+0xb/0x90
[ 5173.871886] softirqs last enabled at (120850): [<ffffffffa1c00319>] __do_softirq+0x319/0x44a
[ 5173.871886] softirqs last disabled at (120845): [<ffffffffa0ef574b>] __irq_exit_rcu+0xbb/0xf0
[ 5173.871886] CPU: 43 PID: 1375 Comm: kworker/43:2 Kdump: loaded Tainted: G L 5.19.0-rc2+ #5
[ 5173.871886] Hardware name: HP ProLiant DL585 G7, BIOS A16 06/04/2013
[ 5173.871886] Workqueue: events __sched_clock_work
[ 5173.871886] RIP: 0010:smp_call_function_many_cond+0x133/0x380
[ 5173.871886] Code: 43 48 8b 75 08 e8 6d d1 43 00 3b 05 c7 a1 a1 01 89 c7 73 30 48 63 c7 48 8b 4d 00 48 03 0c c5 20 ab 2
3 a2 8b 41 08 a8 01 74 0a <f3> 90 8b 51 08 83 e2 01 75 f6 eb cb 49 0f a3 1a 0f 92 44 24 1b e9
[ 5173.871886] RSP: 0018:ffffb3d01c7b7d40 EFLAGS: 00000202
[ 5173.871886] RAX: 0000000000000011 RBX: 0000000000000001 RCX: ffff9a5a7fc3d700
[ 5173.871886] RDX: 0000000000000001 RSI: 0000000000000000 RDI: 0000000000000000
[ 5173.871886] RBP: ffff9b1a7fd33a40 R08: 0000000000000000 R09: 0000000000000000
[ 5173.890970] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
[ 5173.890970] R13: 0000000000000000 R14: 0000000000000030 R15: ffff9b1a7fd[ 5173.890970] FS: 0000000000000000(0000) GS:
ffff9b1a7fd00000(0000) knlGS:0000000000000000
[ 5173.890970] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 5173.890970] CR2: 000055fbcc0bc008 CR3: 000000526ee26000 CR4: 00000000000406e0
[ 5173.890970] Call Trace:
[ 5173.890970] <TASK>
[ 5173.890970] ? optimize_nops+0x210/0x210
[ 5173.890970] on_each_cpu_cond_mask+0x20/0x30
[ 5173.890970] text_poke_bp_batch+0xf4/0x370
[ 5173.890970] ? arch_jump_label_transform_apply+0x13/0x30
[ 5173.890970] text_poke_finish+0x1b/0x30
[ 5173.890970] arch_jump_label_transform_apply+0x18/0x30
[ 5173.890970] static_key_disable_cpuslocked+0x5b/0x90
[ 5173.890970] static_key_disable+0x16/0x20
[ 5173.890970] process_one_work+0x274/0x590
[ 5173.890970] ? process_one_work+0x590/0x590
[ 5173.890970] worker_thread+0x52/0x3a0
[ 5173.890970] ? process_one_work+0x590/0x590
[ 5173.890970] kthread+0xd8/0x100
[ 5173.890970] ? kthread_complete_and_exit+0x20/0x20
[ 5173.890970] ret_from_fork+0x22/0x30
[ 5173.890970] </TASK>
======
clocksource_watchdog()
-->__clocksource_unstable()
-->tsc_cs_mark_unstable()
-->tsc_cs_mark_unstable()
-->clear_sched_clock_stable()
-->__clear_sched_clock_stable()
static DECLARE_WORK(sched_clock_work, __sched_clock_work);
__sched_clock_work()
-->static_branch_disable(&__sched_clock_stable);
-->static_key_disable_cpuslocked()
-->arch_jump_label_transform_apply()
-->text_poke_finish()
-->text_poke_flush()
-->text_poke_bp_batch()
-->text_poke_sync()
-->on_each_cpu()
-->on_each_cpu_cond_mask()
-->smp_call_function_many_cond()
View attachment "dmesg.log" of type "text/plain" (115785 bytes)
View attachment "kdump_boot.log" of type "text/plain" (100506 bytes)
View attachment "config" of type "text/plain" (140131 bytes)
Powered by blists - more mailing lists