[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAAM7YAkHjVg5qj+6b9HAqbE_d6fugAhdCxqsOHgN06VMjbmQvA@mail.gmail.com>
Date: Fri, 20 Sep 2024 15:25:51 +0800
From: "Yan, Zheng" <ukernel@...il.com>
To: Linux Kernel Mailing List <linux-kernel@...r.kernel.org>
Cc: vschneid@...hat.com, Peter Zijlstra <peterz@...radead.org>
Subject: [Bug] premature worker thread wakeup
Hello,
We recently encountered a kernel oops at for 4.18.0-477 el kernel.
crash> bt
PID: 1501282 TASK: ff232219e1528000 CPU: 17 COMMAND: "kworker/70:0"
#0 [ff61ef13b2f67c00] machine_kexec at ffffffffb346c033
#1 [ff61ef13b2f67c58] __crash_kexec at ffffffffb35b5b8a
#2 [ff61ef13b2f67d18] crash_kexec at ffffffffb35b6ac1
#3 [ff61ef13b2f67d30] oops_end at ffffffffb342a9c1
#4 [ff61ef13b2f67d50] no_context at ffffffffb347e913
#5 [ff61ef13b2f67da8] __bad_area_nosemaphore at ffffffffb347ec8c
#6 [ff61ef13b2f67df0] do_page_fault at ffffffffb347f8a7
#7 [ff61ef13b2f67e20] page_fault at ffffffffb400116e
[exception RIP: _raw_spin_lock_irq+19]
RIP: ffffffffb3e02113 RSP: ff61ef13b2f67ed8 RFLAGS: 00010046
RAX: 0000000000000000 RBX: ff23222dcf1d8740 RCX: 0000000000000000
RDX: 0000000000000001 RSI: 0000000000000246 RDI: 0000000000000000
RBP: 0000000000000000 R8: ff232206b00739f8 R9: 0000000000000001
R10: 0000000000000000 R11: ff232206b0071c04 R12: ff232219e1528000
R13: ff61ef13e16efdc8 R14: ffffffffb35164d0 R15: ff23223f8e6072c0
ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
#8 [ff61ef13b2f67ed8] worker_thread at ffffffffb351658f
#9 [ff61ef13b2f67f10] kthread at ffffffffb351d6a4
#10 [ff61ef13b2f67f50] ret_from_fork at ffffffffb400024f
The cause of the oops is that work_thread function get executed while
worker->pool is still null
The oops happens immediately after a kernel warning from work_thread's creator
crash> bt 1168590
PID: 1168590 TASK: ff23225d4f078000 CPU: 70 COMMAND: "kworker/70:11"
#0 [fffffe0000fa1e48] crash_nmi_callback at ffffffffb345e713
#1 [fffffe0000fa1e50] nmi_handle at ffffffffb342b393
#2 [fffffe0000fa1ea8] default_do_nmi at ffffffffb3dee089
#3 [fffffe0000fa1ec8] do_nmi at ffffffffb342b8ef
#4 [fffffe0000fa1ef0] end_repeat_nmi at ffffffffb40015e8
[exception RIP: fbcon_redraw_blit+164]
RIP: ffffffffb3965f44 RSP: ff61ef13e16ef8f0 RFLAGS: 00000006
RAX: 0000000000000e61 RBX: ff2321a8d2a7c740 RCX: 0000000000000000
RDX: ff2321a8d2a7c740 RSI: 0000000000000001 RDI: 0000000000000002
RBP: ff2321a8d2a7c640 R8: 0000000000000010 R9: ff61ef139c4703e0
R10: 000000000000001e R11: 0000000000000000 R12: ff2321a8d2a7c742
R13: ff2321a8d2a7c800 R14: 0000000000000020 R15: 0000000000000006
ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
--- <NMI exception stack> ---
#5 [ff61ef13e16ef8f0] fbcon_redraw_blit at ffffffffb3965f44
#6 [ff61ef13e16ef950] fbcon_scroll at ffffffffb3966d58
#7 [ff61ef13e16ef9a0] con_scroll at ffffffffb3a05adf
#8 [ff61ef13e16efa10] lf at ffffffffb3a05ba4
#9 [ff61ef13e16efa38] vt_console_print at ffffffffb3a07cc4
#10 [ff61ef13e16efaa0] console_unlock at ffffffffb3567396
#11 [ff61ef13e16efb60] vprintk_emit at ffffffffb3569521
#12 [ff61ef13e16efbb0] printk at ffffffffb3569b0c
#13 [ff61ef13e16efc10] show_trace_log_lvl at ffffffffb342afe0
#14 [ff61ef13e16efd00] __warn at ffffffffb34f6e74
#15 [ff61ef13e16efd38] report_bug at ffffffffb3dd62a1
#16 [ff61ef13e16efd60] do_error_trap at ffffffffb342733e
#17 [ff61ef13e16efda0] do_invalid_op at ffffffffb3427916
#18 [ff61ef13e16efdc0] invalid_op at ffffffffb4000d64
[exception RIP: __kthread_bind_mask+29]
RIP: ffffffffb351c99d RSP: ff61ef13e16efe70 RFLAGS: 00010246
RAX: 0000000000000000 RBX: ff232219e1528000 RCX: 0000000000000000
RDX: 0000000000000001 RSI: 0000000000000246 RDI: 0000000000000246
RBP: ff2321a8c0024a20 R8: 0000000080000000 R9: 00000
Looks like the newly created kthread got woke up prematurely. Checking
the source code, path "workqueue: Unbind kworkers before sending them
to exit()" looks suspicious. The patch wakes up dying worker's task up
without holding pool->lock. Is it possible that another worker thread
recently died, the newly create worker thread reused the dead thread's
task_struct and wrongly got woke up.
Regards
Yan, Zheng
Powered by blists - more mailing lists