[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <1cf7b0dbfa4190eeaf0b3401bf7a991b8db59a59.camel@linux.intel.com>
Date: Tue, 29 Oct 2024 15:03:33 -0700
From: Tim Chen <tim.c.chen@...ux.intel.com>
To: Tejun Heo <tj@...nel.org>, Lai Jiangshan <jiangshanlai@...il.com>
Cc: linux-kernel@...r.kernel.org, Thomas Gleixner <tglx@...utronix.de>, Doug
Nelson <doug.nelson@...el.com>, bp@...en8.de, dave.hansen@...ux.intel.com,
hpa@...or.com, mingo@...hat.com, syzkaller-bugs@...glegroups.com,
x86@...nel.org
Subject: BUG: Stall on adding/removing wokers into workqueue pool
Hi Tejun,
Forwarding this task hung seen by my colleague Doug Nelson. He tested
the 6.12-rc4 kernel with an OLTP workload running on a 2 socket with
Granite Rapids CPU that has 86 cores per socket. The traces
seem to indicate that the acquisition
of wq_pool_attach_mutex stalled in idle_cull_fn() when removing worker from
the pool. Doug hit this problem occasionally in his tests.
Searching through the bug reports, there's a similar report by szybot on the
6.12-rc2 kernel. Szybot reported similar task hung when attaching workers to
the pool: https://lore.kernel.org/all/6706c4ba.050a0220.1139e6.0008.GAE@google.com/T/
So we suspect that the problem is not GNR CPU specific.
Wonder if this problem is a known one?
Thanks.
Tim
[Fri Oct 25 18:24:12 2024] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
[Fri Oct 25 18:26:31 2024] INFO: task kworker/46:0H:300 blocked for more than 122 seconds.
[Fri Oct 25 18:26:31 2024] Tainted: G S OE 6.12.0-rc4-dis_fgs #1
[Fri Oct 25 18:26:31 2024] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[Fri Oct 25 18:26:31 2024] task:kworker/46:0H state:D stack:0 pid:300 tgid:300 ppid:2 flags:0x00004000
[Fri Oct 25 18:26:31 2024] Workqueue: 0x0 (events_highpri)
[Fri Oct 25 18:26:31 2024] Call Trace:
[Fri Oct 25 18:26:31 2024] <TASK>
[Fri Oct 25 18:26:31 2024] __schedule+0x347/0xd70
[Fri Oct 25 18:26:31 2024] schedule+0x36/0xc0
[Fri Oct 25 18:26:31 2024] schedule_preempt_disabled+0x15/0x30
[Fri Oct 25 18:26:31 2024] __mutex_lock.isra.14+0x431/0x690
[Fri Oct 25 18:26:31 2024] worker_attach_to_pool+0x1f/0xd0
[Fri Oct 25 18:26:31 2024] create_worker+0xfa/0x1f0
[Fri Oct 25 18:26:31 2024] worker_thread+0x19c/0x240
[Fri Oct 25 18:26:31 2024] ? __pfx_worker_thread+0x10/0x10
[Fri Oct 25 18:26:31 2024] kthread+0xcf/0x100
[Fri Oct 25 18:26:31 2024] ? __pfx_kthread+0x10/0x10
[Fri Oct 25 18:26:31 2024] ret_from_fork+0x31/0x40
[Fri Oct 25 18:26:31 2024] ? __pfx_kthread+0x10/0x10
[Fri Oct 25 18:26:31 2024] ret_from_fork_asm+0x1a/0x30
[Fri Oct 25 18:26:31 2024] </TASK>
[Fri Oct 25 18:26:31 2024] INFO: task kworker/R-kbloc:2466 blocked for more than 122 seconds.
[Fri Oct 25 18:26:31 2024] Tainted: G S OE 6.12.0-rc4-dis_fgs #1
[Fri Oct 25 18:26:31 2024] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[Fri Oct 25 18:26:31 2024] task:kworker/R-kbloc state:D stack:0 pid:2466 tgid:2466 ppid:2 flags:0x00004000
[Fri Oct 25 18:26:31 2024] Call Trace:
[Fri Oct 25 18:26:31 2024] <TASK>
[Fri Oct 25 18:26:31 2024] __schedule+0x347/0xd70
[Fri Oct 25 18:26:31 2024] schedule+0x36/0xc0
[Fri Oct 25 18:26:31 2024] schedule_preempt_disabled+0x15/0x30
[Fri Oct 25 18:26:31 2024] __mutex_lock.isra.14+0x431/0x690
[Fri Oct 25 18:26:31 2024] worker_attach_to_pool+0x1f/0xd0
[Fri Oct 25 18:26:31 2024] rescuer_thread+0x111/0x3b0
[Fri Oct 25 18:26:31 2024] ? __pfx_rescuer_thread+0x10/0x10
[Fri Oct 25 18:26:31 2024] ? __pfx_rescuer_thread+0x10/0x10
[Fri Oct 25 18:26:31 2024] kthread+0xcf/0x100
[Fri Oct 25 18:26:31 2024] ? __pfx_kthread+0x10/0x10
[Fri Oct 25 18:26:31 2024] ret_from_fork+0x31/0x40
[Fri Oct 25 18:26:31 2024] ? __pfx_kthread+0x10/0x10
[Fri Oct 25 18:26:31 2024] ret_from_fork_asm+0x1a/0x30
[Fri Oct 25 18:26:31 2024] </TASK>
[Fri Oct 25 18:26:31 2024] INFO: task kworker/46:1H:3592 blocked for more than 122 seconds.
[Fri Oct 25 18:26:31 2024] Tainted: G S OE 6.12.0-rc4-dis_fgs #1
[Fri Oct 25 18:26:31 2024] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[Fri Oct 25 18:26:31 2024] task:kworker/46:1H state:D stack:0 pid:3592 tgid:3592 ppid:2 flags:0x00004000
[Fri Oct 25 18:26:31 2024] Workqueue: kblockd blk_mq_timeout_work
[Fri Oct 25 18:26:31 2024] Call Trace:
[Fri Oct 25 18:26:31 2024] <TASK>
[Fri Oct 25 18:26:31 2024] __schedule+0x347/0xd70
[Fri Oct 25 18:26:31 2024] schedule+0x36/0xc0
[Fri Oct 25 18:26:31 2024] schedule_timeout+0x283/0x2c0
[Fri Oct 25 18:26:31 2024] ? sched_balance_rq+0xe5/0xd90
[Fri Oct 25 18:26:31 2024] ? __prepare_to_swait+0x52/0x80
[Fri Oct 25 18:26:31 2024] wait_for_completion_state+0x173/0x1d0
[Fri Oct 25 18:26:31 2024] __wait_rcu_gp+0x121/0x150
[Fri Oct 25 18:26:31 2024] synchronize_rcu_normal.part.63+0x3a/0x60
[Fri Oct 25 18:26:31 2024] ? __pfx_call_rcu_hurry+0x10/0x10
[Fri Oct 25 18:26:31 2024] ? __pfx_wakeme_after_rcu+0x10/0x10
[Fri Oct 25 18:26:31 2024] synchronize_rcu_normal+0x9a/0xb0
[Fri Oct 25 18:26:31 2024] blk_mq_timeout_work+0x142/0x1a0
[Fri Oct 25 18:26:31 2024] process_scheduled_works+0xa3/0x3e0
[Fri Oct 25 18:26:31 2024] worker_thread+0x117/0x240
[Fri Oct 25 18:26:31 2024] ? __pfx_worker_thread+0x10/0x10
[Fri Oct 25 18:26:31 2024] kthread+0xcf/0x100
[Fri Oct 25 18:26:31 2024] ? __pfx_kthread+0x10/0x10
[Fri Oct 25 18:26:31 2024] ret_from_fork+0x31/0x40
[Fri Oct 25 18:26:31 2024] ? __pfx_kthread+0x10/0x10
[Fri Oct 25 18:26:31 2024] ret_from_fork_asm+0x1a/0x30
[Fri Oct 25 18:26:31 2024] </TASK>
[Fri Oct 25 18:26:31 2024] INFO: task kworker/u1377:1:38000 blocked for more than 122 seconds.
[Fri Oct 25 18:26:31 2024] Tainted: G S OE 6.12.0-rc4-dis_fgs #1
[Fri Oct 25 18:26:31 2024] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[Fri Oct 25 18:26:31 2024] task:kworker/u1377:1 state:D stack:0 pid:38000 tgid:38000 ppid:2 flags:0x00004000
[Fri Oct 25 18:26:31 2024] Workqueue: events_unbound idle_cull_fn
[Fri Oct 25 18:26:31 2024] Call Trace:
[Fri Oct 25 18:26:31 2024] <TASK>
[Fri Oct 25 18:26:31 2024] __schedule+0x347/0xd70
[Fri Oct 25 18:26:31 2024] schedule+0x36/0xc0
[Fri Oct 25 18:26:31 2024] schedule_preempt_disabled+0x15/0x30
[Fri Oct 25 18:26:31 2024] __mutex_lock.isra.14+0x431/0x690
[Fri Oct 25 18:26:31 2024] idle_cull_fn+0x3b/0xe0
[Fri Oct 25 18:26:31 2024] process_scheduled_works+0xa3/0x3e0
[Fri Oct 25 18:26:31 2024] worker_thread+0x117/0x240
[Fri Oct 25 18:26:31 2024] ? __pfx_worker_thread+0x10/0x10
[Fri Oct 25 18:26:31 2024] kthread+0xcf/0x100
[Fri Oct 25 18:26:31 2024] ? __pfx_kthread+0x10/0x10
[Fri Oct 25 18:26:31 2024] ret_from_fork+0x31/0x40
[Fri Oct 25 18:26:31 2024] ? __pfx_kthread+0x10/0x10
[Fri Oct 25 18:26:31 2024] ret_from_fork_asm+0x1a/0x30
[Fri Oct 25 18:26:31 2024] </TASK>
[Fri Oct 25 18:26:31 2024] INFO: task kworker/u1377:3:46111 blocked for more than 122 seconds.
[Fri Oct 25 18:26:31 2024] Tainted: G S OE 6.12.0-rc4-dis_fgs #1
[Fri Oct 25 18:26:31 2024] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[Fri Oct 25 18:26:31 2024] task:kworker/u1377:3 state:D stack:0 pid:46111 tgid:46111 ppid:2 flags:0x00004000
[Fri Oct 25 18:26:31 2024] Workqueue: events_unbound idle_cull_fn
[Fri Oct 25 18:26:31 2024] Call Trace:
[Fri Oct 25 18:26:31 2024] <TASK>
[Fri Oct 25 18:26:31 2024] __schedule+0x347/0xd70
[Fri Oct 25 18:26:31 2024] schedule+0x36/0xc0
[Fri Oct 25 18:26:31 2024] schedule_preempt_disabled+0x15/0x30
[Fri Oct 25 18:26:31 2024] __mutex_lock.isra.14+0x431/0x690
[Fri Oct 25 18:26:31 2024] ? ttwu_do_activate+0x6a/0x210
[Fri Oct 25 18:26:31 2024] idle_cull_fn+0x3b/0xe0
[Fri Oct 25 18:26:31 2024] process_scheduled_works+0xa3/0x3e0
[Fri Oct 25 18:26:31 2024] worker_thread+0x117/0x240
[Fri Oct 25 18:26:31 2024] ? __pfx_worker_thread+0x10/0x10
[Fri Oct 25 18:26:31 2024] kthread+0xcf/0x100
[Fri Oct 25 18:26:31 2024] ? __pfx_kthread+0x10/0x10
[Fri Oct 25 18:26:31 2024] ret_from_fork+0x31/0x40
[Fri Oct 25 18:26:31 2024] ? __pfx_kthread+0x10/0x10
[Fri Oct 25 18:26:31 2024] ret_from_fork_asm+0x1a/0x30
[Fri Oct 25 18:26:31 2024] </TASK>
[Fri Oct 25 18:26:31 2024] INFO: task kworker/u1377:5:46411 blocked for more than 122 seconds.
[Fri Oct 25 18:26:31 2024] Tainted: G S OE 6.12.0-rc4-dis_fgs #1
[Fri Oct 25 18:26:31 2024] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[Fri Oct 25 18:26:31 2024] task:kworker/u1377:5 state:D stack:0 pid:46411 tgid:46411 ppid:2 flags:0x00004000
[Fri Oct 25 18:26:31 2024] Workqueue: events_unbound idle_cull_fn
[Fri Oct 25 18:26:31 2024] Call Trace:
[Fri Oct 25 18:26:31 2024] <TASK>
[Fri Oct 25 18:26:31 2024] __schedule+0x347/0xd70
[Fri Oct 25 18:26:31 2024] schedule+0x36/0xc0
[Fri Oct 25 18:26:31 2024] schedule_preempt_disabled+0x15/0x30
[Fri Oct 25 18:26:31 2024] __mutex_lock.isra.14+0x431/0x690
[Fri Oct 25 18:26:31 2024] ? try_to_wake_up+0x22e/0x690
[Fri Oct 25 18:26:31 2024] idle_cull_fn+0x3b/0xe0
[Fri Oct 25 18:26:31 2024] process_scheduled_works+0xa3/0x3e0
[Fri Oct 25 18:26:31 2024] worker_thread+0x117/0x240
[Fri Oct 25 18:26:31 2024] ? __pfx_worker_thread+0x10/0x10
[Fri Oct 25 18:26:31 2024] kthread+0xcf/0x100
[Fri Oct 25 18:26:31 2024] ? __pfx_kthread+0x10/0x10
[Fri Oct 25 18:26:31 2024] ret_from_fork+0x31/0x40
[Fri Oct 25 18:26:31 2024] ? __pfx_kthread+0x10/0x10
[Fri Oct 25 18:26:31 2024] ret_from_fork_asm+0x1a/0x30
[Fri Oct 25 18:26:31 2024] </TASK>
[Fri Oct 25 18:26:31 2024] INFO: task kworker/u1379:6:53043 blocked for more than 122 seconds.
[Fri Oct 25 18:26:31 2024] Tainted: G S OE 6.12.0-rc4-dis_fgs #1
[Fri Oct 25 18:26:31 2024] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[Fri Oct 25 18:26:31 2024] task:kworker/u1379:6 state:D stack:0 pid:53043 tgid:53043 ppid:2 flags:0x00004000
[Fri Oct 25 18:26:31 2024] Workqueue: events_unbound idle_cull_fn
[Fri Oct 25 18:26:31 2024] Call Trace:
[Fri Oct 25 18:26:31 2024] <TASK>
[Fri Oct 25 18:26:31 2024] __schedule+0x347/0xd70
[Fri Oct 25 18:26:31 2024] schedule+0x36/0xc0
[Fri Oct 25 18:26:31 2024] schedule_preempt_disabled+0x15/0x30
[Fri Oct 25 18:26:31 2024] __mutex_lock.isra.14+0x431/0x690
[Fri Oct 25 18:26:31 2024] idle_cull_fn+0x3b/0xe0
[Fri Oct 25 18:26:31 2024] process_scheduled_works+0xa3/0x3e0
[Fri Oct 25 18:26:31 2024] worker_thread+0x117/0x240
[Fri Oct 25 18:26:31 2024] ? __pfx_worker_thread+0x10/0x10
[Fri Oct 25 18:26:31 2024] kthread+0xcf/0x100
[Fri Oct 25 18:26:31 2024] ? __pfx_kthread+0x10/0x10
[Fri Oct 25 18:26:31 2024] ret_from_fork+0x31/0x40
[Fri Oct 25 18:26:31 2024] ? __pfx_kthread+0x10/0x10
[Fri Oct 25 18:26:31 2024] ret_from_fork_asm+0x1a/0x30
[Fri Oct 25 18:26:31 2024] </TASK>
[Fri Oct 25 18:26:31 2024] INFO: task kworker/u1377:7:53460 blocked for more than 122 seconds.
[Fri Oct 25 18:26:31 2024] Tainted: G S OE 6.12.0-rc4-dis_fgs #1
[Fri Oct 25 18:26:31 2024] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[Fri Oct 25 18:26:31 2024] task:kworker/u1377:7 state:D stack:0 pid:53460 tgid:53460 ppid:2 flags:0x00004000
[Fri Oct 25 18:26:31 2024] Workqueue: events_unbound idle_cull_fn
[Fri Oct 25 18:26:31 2024] Call Trace:
[Fri Oct 25 18:26:31 2024] <TASK>
[Fri Oct 25 18:26:31 2024] __schedule+0x347/0xd70
[Fri Oct 25 18:26:31 2024] schedule+0x36/0xc0
[Fri Oct 25 18:26:31 2024] schedule_preempt_disabled+0x15/0x30
[Fri Oct 25 18:26:31 2024] __mutex_lock.isra.14+0x431/0x690
[Fri Oct 25 18:26:31 2024] idle_cull_fn+0x3b/0xe0
[Fri Oct 25 18:26:31 2024] process_scheduled_works+0xa3/0x3e0
[Fri Oct 25 18:26:31 2024] worker_thread+0x117/0x240
[Fri Oct 25 18:26:31 2024] ? __pfx_worker_thread+0x10/0x10
[Fri Oct 25 18:26:31 2024] kthread+0xcf/0x100
[Fri Oct 25 18:26:31 2024] ? __pfx_kthread+0x10/0x10
[Fri Oct 25 18:26:31 2024] ret_from_fork+0x31/0x40
[Fri Oct 25 18:26:31 2024] ? __pfx_kthread+0x10/0x10
[Fri Oct 25 18:26:31 2024] ret_from_fork_asm+0x1a/0x30
[Fri Oct 25 18:26:31 2024] </TASK>
[Fri Oct 25 18:26:31 2024] INFO: task kworker/u1377:6:67016 blocked for more than 122 seconds.
[Fri Oct 25 18:26:31 2024] Tainted: G S OE 6.12.0-rc4-dis_fgs #1
[Fri Oct 25 18:26:31 2024] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[Fri Oct 25 18:26:31 2024] task:kworker/u1377:6 state:D stack:0 pid:67016 tgid:67016 ppid:2 flags:0x00004000
[Fri Oct 25 18:26:31 2024] Workqueue: 0x0 (events_unbound)
[Fri Oct 25 18:26:31 2024] Call Trace:
[Fri Oct 25 18:26:31 2024] <TASK>
[Fri Oct 25 18:26:31 2024] __schedule+0x347/0xd70
[Fri Oct 25 18:26:31 2024] schedule+0x36/0xc0
[Fri Oct 25 18:26:31 2024] schedule_preempt_disabled+0x15/0x30
[Fri Oct 25 18:26:31 2024] __mutex_lock.isra.14+0x431/0x690
[Fri Oct 25 18:26:31 2024] worker_attach_to_pool+0x1f/0xd0
[Fri Oct 25 18:26:31 2024] create_worker+0xfa/0x1f0
[Fri Oct 25 18:26:31 2024] worker_thread+0x19c/0x240
[Fri Oct 25 18:26:31 2024] ? __pfx_worker_thread+0x10/0x10
[Fri Oct 25 18:26:31 2024] kthread+0xcf/0x100
[Fri Oct 25 18:26:31 2024] ? __pfx_kthread+0x10/0x10
[Fri Oct 25 18:26:31 2024] ret_from_fork+0x31/0x40
[Fri Oct 25 18:26:31 2024] ? __pfx_kthread+0x10/0x10
[Fri Oct 25 18:26:31 2024] ret_from_fork_asm+0x1a/0x30
[Fri Oct 25 18:26:31 2024] </TASK>
[Fri Oct 25 18:26:31 2024] INFO: task kworker/u1379:4:67681 blocked for more than 122 seconds.
[Fri Oct 25 18:26:31 2024] Tainted: G S OE 6.12.0-rc4-dis_fgs #1
[Fri Oct 25 18:26:31 2024] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[Fri Oct 25 18:26:31 2024] task:kworker/u1379:4 state:D stack:0 pid:67681 tgid:67681 ppid:2 flags:0x00004000
[Fri Oct 25 18:26:31 2024] Workqueue: events_unbound idle_cull_fn
[Fri Oct 25 18:26:31 2024] Call Trace:
[Fri Oct 25 18:26:31 2024] <TASK>
[Fri Oct 25 18:26:31 2024] __schedule+0x347/0xd70
[Fri Oct 25 18:26:31 2024] schedule+0x36/0xc0
[Fri Oct 25 18:26:31 2024] schedule_preempt_disabled+0x15/0x30
[Fri Oct 25 18:26:31 2024] __mutex_lock.isra.14+0x431/0x690
[Fri Oct 25 18:26:31 2024] ? try_to_wake_up+0x22e/0x690
[Fri Oct 25 18:26:31 2024] idle_cull_fn+0x3b/0xe0
[Fri Oct 25 18:26:31 2024] process_scheduled_works+0xa3/0x3e0
[Fri Oct 25 18:26:31 2024] worker_thread+0x117/0x240
[Fri Oct 25 18:26:31 2024] ? __pfx_worker_thread+0x10/0x10
[Fri Oct 25 18:26:31 2024] kthread+0xcf/0x100
[Fri Oct 25 18:26:31 2024] ? __pfx_kthread+0x10/0x10
[Fri Oct 25 18:26:31 2024] ret_from_fork+0x31/0x40
[Fri Oct 25 18:26:31 2024] ? __pfx_kthread+0x10/0x10
[Fri Oct 25 18:26:31 2024] ret_from_fork_asm+0x1a/0x30
[Fri Oct 25 18:26:31 2024] </TASK>
[Fri Oct 25 18:26:31 2024] Future hung task reports are suppressed, see sysctl kernel.hung_task_warnings
[oracle@...-1 GNR startup_scripts]$ uname -a
Linux bhs-1 6.12.0-rc4-dis_fgs #1 SMP PREEMPT_DYNAMIC Fri Oct 25 07:26:06 PDT 2024 x86_64 x86_64 x86_64 GNU/Linux
Powered by blists - more mailing lists