lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Date:   Wed, 29 Jun 2022 18:25:22 +0800
From:   Baoquan He <bhe@...hat.com>
To:     jstultz@...gle.com, tglx@...utronix.de, sboyd@...nel.org
Cc:     linux-kernel@...r.kernel.org, x86@...nel.org,
        kexec@...ts.infradead.org
Subject: Unstable tsc caused soft lockup in kdump kernel

Hi,

On a HP machine, after crash triggered via sysrq-c, kdump kernel will
boot and get soft lockup as below. And this can be always reproduced.

>From log, it seems that unreliable tsc was marked as unstable in
clocksource_watchdog, then worker sched_clock_work was scheduled. And
this tsc unstable marking always happened after sysrq-c is triggered.
And the cpu where worker smp_call_function_many_cond is running won't
be stopped and hang there and keep locks, even though the cpu should be
stopped. While kdump kernel is running in a different cpu and boot, then
soft lockup happened because other workers or the relevant threads are
waiting for locks taken by the hang sched_clock_work worker.

Any idea or suggestion?

The normal kernel boot log and kdump kernel log, kernel config, are all
attached, please check.

Thanks
Baoquan

=====
[ 5173.871886] watchdog: BUG: soft lockup - CPU#43 stuck for 52s! [kworker/43:2:1375]
[ 5173.871886] Modules linked in: qrtr ......
[ 5173.871886] irq event stamp: 120854
[ 5173.871886] hardirqs last  enabled at (120853): [<ffffffffa1a00d0b>] asm_sysvec_apic_timer_interrupt+0x1b/0x20
[ 5173.871886] hardirqs last disabled at (120854): [<ffffffffa19a975b>] sysvec_apic_timer_interrupt+0xb/0x90
[ 5173.871886] softirqs last  enabled at (120850): [<ffffffffa1c00319>] __do_softirq+0x319/0x44a
[ 5173.871886] softirqs last disabled at (120845): [<ffffffffa0ef574b>] __irq_exit_rcu+0xbb/0xf0
[ 5173.871886] CPU: 43 PID: 1375 Comm: kworker/43:2 Kdump: loaded Tainted: G             L    5.19.0-rc2+ #5
[ 5173.871886] Hardware name: HP ProLiant DL585 G7, BIOS A16 06/04/2013
[ 5173.871886] Workqueue: events __sched_clock_work
[ 5173.871886] RIP: 0010:smp_call_function_many_cond+0x133/0x380
[ 5173.871886] Code: 43 48 8b 75 08 e8 6d d1 43 00 3b 05 c7 a1 a1 01 89 c7 73 30 48 63 c7 48 8b 4d 00 48 03 0c c5 20 ab 2
3 a2 8b 41 08 a8 01 74 0a <f3> 90 8b 51 08 83 e2 01 75 f6 eb cb 49 0f a3 1a 0f 92 44 24 1b e9
[ 5173.871886] RSP: 0018:ffffb3d01c7b7d40 EFLAGS: 00000202
[ 5173.871886] RAX: 0000000000000011 RBX: 0000000000000001 RCX: ffff9a5a7fc3d700
[ 5173.871886] RDX: 0000000000000001 RSI: 0000000000000000 RDI: 0000000000000000
[ 5173.871886] RBP: ffff9b1a7fd33a40 R08: 0000000000000000 R09: 0000000000000000
[ 5173.890970] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
[ 5173.890970] R13: 0000000000000000 R14: 0000000000000030 R15: ffff9b1a7fd[ 5173.890970] FS:  0000000000000000(0000) GS:
ffff9b1a7fd00000(0000) knlGS:0000000000000000
[ 5173.890970] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 5173.890970] CR2: 000055fbcc0bc008 CR3: 000000526ee26000 CR4: 00000000000406e0
[ 5173.890970] Call Trace:
[ 5173.890970]  <TASK>
[ 5173.890970]  ? optimize_nops+0x210/0x210
[ 5173.890970]  on_each_cpu_cond_mask+0x20/0x30
[ 5173.890970]  text_poke_bp_batch+0xf4/0x370
[ 5173.890970]  ? arch_jump_label_transform_apply+0x13/0x30
[ 5173.890970]  text_poke_finish+0x1b/0x30
[ 5173.890970]  arch_jump_label_transform_apply+0x18/0x30
[ 5173.890970]  static_key_disable_cpuslocked+0x5b/0x90
[ 5173.890970]  static_key_disable+0x16/0x20
[ 5173.890970]  process_one_work+0x274/0x590
[ 5173.890970]  ? process_one_work+0x590/0x590
[ 5173.890970]  worker_thread+0x52/0x3a0
[ 5173.890970]  ? process_one_work+0x590/0x590
[ 5173.890970]  kthread+0xd8/0x100
[ 5173.890970]  ? kthread_complete_and_exit+0x20/0x20
[ 5173.890970]  ret_from_fork+0x22/0x30
[ 5173.890970]  </TASK>

======
clocksource_watchdog()
  -->__clocksource_unstable()
     -->tsc_cs_mark_unstable()
        -->tsc_cs_mark_unstable()
          -->clear_sched_clock_stable()
             -->__clear_sched_clock_stable()
             static DECLARE_WORK(sched_clock_work, __sched_clock_work);
             __sched_clock_work()
             -->static_branch_disable(&__sched_clock_stable);
               -->static_key_disable_cpuslocked()
                 -->arch_jump_label_transform_apply()
                   -->text_poke_finish()
                     -->text_poke_flush()
                       -->text_poke_bp_batch()
                         -->text_poke_sync()
                           -->on_each_cpu()
                             -->on_each_cpu_cond_mask()
                               -->smp_call_function_many_cond()


View attachment "dmesg.log" of type "text/plain" (115785 bytes)

View attachment "kdump_boot.log" of type "text/plain" (100506 bytes)

View attachment "config" of type "text/plain" (140131 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ