lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <1cf7b0dbfa4190eeaf0b3401bf7a991b8db59a59.camel@linux.intel.com>
Date: Tue, 29 Oct 2024 15:03:33 -0700
From: Tim Chen <tim.c.chen@...ux.intel.com>
To: Tejun Heo <tj@...nel.org>, Lai Jiangshan <jiangshanlai@...il.com>
Cc: linux-kernel@...r.kernel.org, Thomas Gleixner <tglx@...utronix.de>, Doug
 Nelson <doug.nelson@...el.com>, bp@...en8.de, dave.hansen@...ux.intel.com,
 hpa@...or.com,  mingo@...hat.com, syzkaller-bugs@...glegroups.com,
 x86@...nel.org
Subject: BUG: Stall on adding/removing wokers into workqueue pool

Hi Tejun,

Forwarding this task hung seen by my colleague Doug Nelson. He tested
the 6.12-rc4 kernel with an OLTP workload running on a 2 socket with
Granite Rapids CPU that has 86 cores per socket. The traces 
seem to indicate that the acquisition 
of wq_pool_attach_mutex stalled in idle_cull_fn() when removing worker from
the pool. Doug hit this problem occasionally in his tests.

Searching through the bug reports, there's a similar report by szybot on the
6.12-rc2 kernel. Szybot reported similar task hung when attaching workers to
the pool: https://lore.kernel.org/all/6706c4ba.050a0220.1139e6.0008.GAE@google.com/T/
So we suspect that the problem is not GNR CPU specific.

Wonder if this problem is a known one?

Thanks.

Tim


[Fri Oct 25 18:24:12 2024] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
[Fri Oct 25 18:26:31 2024] INFO: task kworker/46:0H:300 blocked for more than 122 seconds.
[Fri Oct 25 18:26:31 2024]       Tainted: G S         OE      6.12.0-rc4-dis_fgs #1
[Fri Oct 25 18:26:31 2024] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[Fri Oct 25 18:26:31 2024] task:kworker/46:0H   state:D stack:0     pid:300   tgid:300   ppid:2      flags:0x00004000
[Fri Oct 25 18:26:31 2024] Workqueue:  0x0 (events_highpri)
[Fri Oct 25 18:26:31 2024] Call Trace:
[Fri Oct 25 18:26:31 2024]  <TASK>
[Fri Oct 25 18:26:31 2024]  __schedule+0x347/0xd70
[Fri Oct 25 18:26:31 2024]  schedule+0x36/0xc0
[Fri Oct 25 18:26:31 2024]  schedule_preempt_disabled+0x15/0x30
[Fri Oct 25 18:26:31 2024]  __mutex_lock.isra.14+0x431/0x690
[Fri Oct 25 18:26:31 2024]  worker_attach_to_pool+0x1f/0xd0
[Fri Oct 25 18:26:31 2024]  create_worker+0xfa/0x1f0
[Fri Oct 25 18:26:31 2024]  worker_thread+0x19c/0x240
[Fri Oct 25 18:26:31 2024]  ? __pfx_worker_thread+0x10/0x10
[Fri Oct 25 18:26:31 2024]  kthread+0xcf/0x100
[Fri Oct 25 18:26:31 2024]  ? __pfx_kthread+0x10/0x10
[Fri Oct 25 18:26:31 2024]  ret_from_fork+0x31/0x40
[Fri Oct 25 18:26:31 2024]  ? __pfx_kthread+0x10/0x10
[Fri Oct 25 18:26:31 2024]  ret_from_fork_asm+0x1a/0x30
[Fri Oct 25 18:26:31 2024]  </TASK>
[Fri Oct 25 18:26:31 2024] INFO: task kworker/R-kbloc:2466 blocked for more than 122 seconds.
[Fri Oct 25 18:26:31 2024]       Tainted: G S         OE      6.12.0-rc4-dis_fgs #1
[Fri Oct 25 18:26:31 2024] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[Fri Oct 25 18:26:31 2024] task:kworker/R-kbloc state:D stack:0     pid:2466  tgid:2466  ppid:2      flags:0x00004000
[Fri Oct 25 18:26:31 2024] Call Trace:
[Fri Oct 25 18:26:31 2024]  <TASK>
[Fri Oct 25 18:26:31 2024]  __schedule+0x347/0xd70
[Fri Oct 25 18:26:31 2024]  schedule+0x36/0xc0
[Fri Oct 25 18:26:31 2024]  schedule_preempt_disabled+0x15/0x30
[Fri Oct 25 18:26:31 2024]  __mutex_lock.isra.14+0x431/0x690
[Fri Oct 25 18:26:31 2024]  worker_attach_to_pool+0x1f/0xd0
[Fri Oct 25 18:26:31 2024]  rescuer_thread+0x111/0x3b0
[Fri Oct 25 18:26:31 2024]  ? __pfx_rescuer_thread+0x10/0x10
[Fri Oct 25 18:26:31 2024]  ? __pfx_rescuer_thread+0x10/0x10
[Fri Oct 25 18:26:31 2024]  kthread+0xcf/0x100
[Fri Oct 25 18:26:31 2024]  ? __pfx_kthread+0x10/0x10
[Fri Oct 25 18:26:31 2024]  ret_from_fork+0x31/0x40
[Fri Oct 25 18:26:31 2024]  ? __pfx_kthread+0x10/0x10
[Fri Oct 25 18:26:31 2024]  ret_from_fork_asm+0x1a/0x30
[Fri Oct 25 18:26:31 2024]  </TASK>
[Fri Oct 25 18:26:31 2024] INFO: task kworker/46:1H:3592 blocked for more than 122 seconds.
[Fri Oct 25 18:26:31 2024]       Tainted: G S         OE      6.12.0-rc4-dis_fgs #1
[Fri Oct 25 18:26:31 2024] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[Fri Oct 25 18:26:31 2024] task:kworker/46:1H   state:D stack:0     pid:3592  tgid:3592  ppid:2      flags:0x00004000
[Fri Oct 25 18:26:31 2024] Workqueue: kblockd blk_mq_timeout_work
[Fri Oct 25 18:26:31 2024] Call Trace:
[Fri Oct 25 18:26:31 2024]  <TASK>
[Fri Oct 25 18:26:31 2024]  __schedule+0x347/0xd70
[Fri Oct 25 18:26:31 2024]  schedule+0x36/0xc0
[Fri Oct 25 18:26:31 2024]  schedule_timeout+0x283/0x2c0
[Fri Oct 25 18:26:31 2024]  ? sched_balance_rq+0xe5/0xd90
[Fri Oct 25 18:26:31 2024]  ? __prepare_to_swait+0x52/0x80
[Fri Oct 25 18:26:31 2024]  wait_for_completion_state+0x173/0x1d0
[Fri Oct 25 18:26:31 2024]  __wait_rcu_gp+0x121/0x150
[Fri Oct 25 18:26:31 2024]  synchronize_rcu_normal.part.63+0x3a/0x60
[Fri Oct 25 18:26:31 2024]  ? __pfx_call_rcu_hurry+0x10/0x10
[Fri Oct 25 18:26:31 2024]  ? __pfx_wakeme_after_rcu+0x10/0x10
[Fri Oct 25 18:26:31 2024]  synchronize_rcu_normal+0x9a/0xb0
[Fri Oct 25 18:26:31 2024]  blk_mq_timeout_work+0x142/0x1a0
[Fri Oct 25 18:26:31 2024]  process_scheduled_works+0xa3/0x3e0
[Fri Oct 25 18:26:31 2024]  worker_thread+0x117/0x240
[Fri Oct 25 18:26:31 2024]  ? __pfx_worker_thread+0x10/0x10
[Fri Oct 25 18:26:31 2024]  kthread+0xcf/0x100
[Fri Oct 25 18:26:31 2024]  ? __pfx_kthread+0x10/0x10
[Fri Oct 25 18:26:31 2024]  ret_from_fork+0x31/0x40
[Fri Oct 25 18:26:31 2024]  ? __pfx_kthread+0x10/0x10
[Fri Oct 25 18:26:31 2024]  ret_from_fork_asm+0x1a/0x30
[Fri Oct 25 18:26:31 2024]  </TASK>
[Fri Oct 25 18:26:31 2024] INFO: task kworker/u1377:1:38000 blocked for more than 122 seconds.
[Fri Oct 25 18:26:31 2024]       Tainted: G S         OE      6.12.0-rc4-dis_fgs #1
[Fri Oct 25 18:26:31 2024] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[Fri Oct 25 18:26:31 2024] task:kworker/u1377:1 state:D stack:0     pid:38000 tgid:38000 ppid:2      flags:0x00004000
[Fri Oct 25 18:26:31 2024] Workqueue: events_unbound idle_cull_fn
[Fri Oct 25 18:26:31 2024] Call Trace:
[Fri Oct 25 18:26:31 2024]  <TASK>
[Fri Oct 25 18:26:31 2024]  __schedule+0x347/0xd70
[Fri Oct 25 18:26:31 2024]  schedule+0x36/0xc0
[Fri Oct 25 18:26:31 2024]  schedule_preempt_disabled+0x15/0x30
[Fri Oct 25 18:26:31 2024]  __mutex_lock.isra.14+0x431/0x690
[Fri Oct 25 18:26:31 2024]  idle_cull_fn+0x3b/0xe0
[Fri Oct 25 18:26:31 2024]  process_scheduled_works+0xa3/0x3e0
[Fri Oct 25 18:26:31 2024]  worker_thread+0x117/0x240
[Fri Oct 25 18:26:31 2024]  ? __pfx_worker_thread+0x10/0x10
[Fri Oct 25 18:26:31 2024]  kthread+0xcf/0x100
[Fri Oct 25 18:26:31 2024]  ? __pfx_kthread+0x10/0x10
[Fri Oct 25 18:26:31 2024]  ret_from_fork+0x31/0x40
[Fri Oct 25 18:26:31 2024]  ? __pfx_kthread+0x10/0x10
[Fri Oct 25 18:26:31 2024]  ret_from_fork_asm+0x1a/0x30
[Fri Oct 25 18:26:31 2024]  </TASK>
[Fri Oct 25 18:26:31 2024] INFO: task kworker/u1377:3:46111 blocked for more than 122 seconds.
[Fri Oct 25 18:26:31 2024]       Tainted: G S         OE      6.12.0-rc4-dis_fgs #1
[Fri Oct 25 18:26:31 2024] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[Fri Oct 25 18:26:31 2024] task:kworker/u1377:3 state:D stack:0     pid:46111 tgid:46111 ppid:2      flags:0x00004000
[Fri Oct 25 18:26:31 2024] Workqueue: events_unbound idle_cull_fn
[Fri Oct 25 18:26:31 2024] Call Trace:
[Fri Oct 25 18:26:31 2024]  <TASK>
[Fri Oct 25 18:26:31 2024]  __schedule+0x347/0xd70
[Fri Oct 25 18:26:31 2024]  schedule+0x36/0xc0
[Fri Oct 25 18:26:31 2024]  schedule_preempt_disabled+0x15/0x30
[Fri Oct 25 18:26:31 2024]  __mutex_lock.isra.14+0x431/0x690
[Fri Oct 25 18:26:31 2024]  ? ttwu_do_activate+0x6a/0x210
[Fri Oct 25 18:26:31 2024]  idle_cull_fn+0x3b/0xe0
[Fri Oct 25 18:26:31 2024]  process_scheduled_works+0xa3/0x3e0
[Fri Oct 25 18:26:31 2024]  worker_thread+0x117/0x240
[Fri Oct 25 18:26:31 2024]  ? __pfx_worker_thread+0x10/0x10
[Fri Oct 25 18:26:31 2024]  kthread+0xcf/0x100
[Fri Oct 25 18:26:31 2024]  ? __pfx_kthread+0x10/0x10
[Fri Oct 25 18:26:31 2024]  ret_from_fork+0x31/0x40
[Fri Oct 25 18:26:31 2024]  ? __pfx_kthread+0x10/0x10
[Fri Oct 25 18:26:31 2024]  ret_from_fork_asm+0x1a/0x30
[Fri Oct 25 18:26:31 2024]  </TASK>
[Fri Oct 25 18:26:31 2024] INFO: task kworker/u1377:5:46411 blocked for more than 122 seconds.
[Fri Oct 25 18:26:31 2024]       Tainted: G S         OE      6.12.0-rc4-dis_fgs #1
[Fri Oct 25 18:26:31 2024] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[Fri Oct 25 18:26:31 2024] task:kworker/u1377:5 state:D stack:0     pid:46411 tgid:46411 ppid:2      flags:0x00004000
[Fri Oct 25 18:26:31 2024] Workqueue: events_unbound idle_cull_fn
[Fri Oct 25 18:26:31 2024] Call Trace:
[Fri Oct 25 18:26:31 2024]  <TASK>
[Fri Oct 25 18:26:31 2024]  __schedule+0x347/0xd70
[Fri Oct 25 18:26:31 2024]  schedule+0x36/0xc0
[Fri Oct 25 18:26:31 2024]  schedule_preempt_disabled+0x15/0x30
[Fri Oct 25 18:26:31 2024]  __mutex_lock.isra.14+0x431/0x690
[Fri Oct 25 18:26:31 2024]  ? try_to_wake_up+0x22e/0x690
[Fri Oct 25 18:26:31 2024]  idle_cull_fn+0x3b/0xe0
[Fri Oct 25 18:26:31 2024]  process_scheduled_works+0xa3/0x3e0
[Fri Oct 25 18:26:31 2024]  worker_thread+0x117/0x240
[Fri Oct 25 18:26:31 2024]  ? __pfx_worker_thread+0x10/0x10
[Fri Oct 25 18:26:31 2024]  kthread+0xcf/0x100
[Fri Oct 25 18:26:31 2024]  ? __pfx_kthread+0x10/0x10
[Fri Oct 25 18:26:31 2024]  ret_from_fork+0x31/0x40
[Fri Oct 25 18:26:31 2024]  ? __pfx_kthread+0x10/0x10
[Fri Oct 25 18:26:31 2024]  ret_from_fork_asm+0x1a/0x30
[Fri Oct 25 18:26:31 2024]  </TASK>
[Fri Oct 25 18:26:31 2024] INFO: task kworker/u1379:6:53043 blocked for more than 122 seconds.
[Fri Oct 25 18:26:31 2024]       Tainted: G S         OE      6.12.0-rc4-dis_fgs #1
[Fri Oct 25 18:26:31 2024] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[Fri Oct 25 18:26:31 2024] task:kworker/u1379:6 state:D stack:0     pid:53043 tgid:53043 ppid:2      flags:0x00004000
[Fri Oct 25 18:26:31 2024] Workqueue: events_unbound idle_cull_fn
[Fri Oct 25 18:26:31 2024] Call Trace:
[Fri Oct 25 18:26:31 2024]  <TASK>
[Fri Oct 25 18:26:31 2024]  __schedule+0x347/0xd70
[Fri Oct 25 18:26:31 2024]  schedule+0x36/0xc0
[Fri Oct 25 18:26:31 2024]  schedule_preempt_disabled+0x15/0x30
[Fri Oct 25 18:26:31 2024]  __mutex_lock.isra.14+0x431/0x690
[Fri Oct 25 18:26:31 2024]  idle_cull_fn+0x3b/0xe0
[Fri Oct 25 18:26:31 2024]  process_scheduled_works+0xa3/0x3e0
[Fri Oct 25 18:26:31 2024]  worker_thread+0x117/0x240
[Fri Oct 25 18:26:31 2024]  ? __pfx_worker_thread+0x10/0x10
[Fri Oct 25 18:26:31 2024]  kthread+0xcf/0x100
[Fri Oct 25 18:26:31 2024]  ? __pfx_kthread+0x10/0x10
[Fri Oct 25 18:26:31 2024]  ret_from_fork+0x31/0x40
[Fri Oct 25 18:26:31 2024]  ? __pfx_kthread+0x10/0x10
[Fri Oct 25 18:26:31 2024]  ret_from_fork_asm+0x1a/0x30
[Fri Oct 25 18:26:31 2024]  </TASK>
[Fri Oct 25 18:26:31 2024] INFO: task kworker/u1377:7:53460 blocked for more than 122 seconds.
[Fri Oct 25 18:26:31 2024]       Tainted: G S         OE      6.12.0-rc4-dis_fgs #1
[Fri Oct 25 18:26:31 2024] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[Fri Oct 25 18:26:31 2024] task:kworker/u1377:7 state:D stack:0     pid:53460 tgid:53460 ppid:2      flags:0x00004000
[Fri Oct 25 18:26:31 2024] Workqueue: events_unbound idle_cull_fn
[Fri Oct 25 18:26:31 2024] Call Trace:
[Fri Oct 25 18:26:31 2024]  <TASK>
[Fri Oct 25 18:26:31 2024]  __schedule+0x347/0xd70
[Fri Oct 25 18:26:31 2024]  schedule+0x36/0xc0
[Fri Oct 25 18:26:31 2024]  schedule_preempt_disabled+0x15/0x30
[Fri Oct 25 18:26:31 2024]  __mutex_lock.isra.14+0x431/0x690
[Fri Oct 25 18:26:31 2024]  idle_cull_fn+0x3b/0xe0
[Fri Oct 25 18:26:31 2024]  process_scheduled_works+0xa3/0x3e0
[Fri Oct 25 18:26:31 2024]  worker_thread+0x117/0x240
[Fri Oct 25 18:26:31 2024]  ? __pfx_worker_thread+0x10/0x10
[Fri Oct 25 18:26:31 2024]  kthread+0xcf/0x100
[Fri Oct 25 18:26:31 2024]  ? __pfx_kthread+0x10/0x10
[Fri Oct 25 18:26:31 2024]  ret_from_fork+0x31/0x40
[Fri Oct 25 18:26:31 2024]  ? __pfx_kthread+0x10/0x10
[Fri Oct 25 18:26:31 2024]  ret_from_fork_asm+0x1a/0x30
[Fri Oct 25 18:26:31 2024]  </TASK>
[Fri Oct 25 18:26:31 2024] INFO: task kworker/u1377:6:67016 blocked for more than 122 seconds.
[Fri Oct 25 18:26:31 2024]       Tainted: G S         OE      6.12.0-rc4-dis_fgs #1
[Fri Oct 25 18:26:31 2024] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[Fri Oct 25 18:26:31 2024] task:kworker/u1377:6 state:D stack:0     pid:67016 tgid:67016 ppid:2      flags:0x00004000
[Fri Oct 25 18:26:31 2024] Workqueue:  0x0 (events_unbound)
[Fri Oct 25 18:26:31 2024] Call Trace:
[Fri Oct 25 18:26:31 2024]  <TASK>
[Fri Oct 25 18:26:31 2024]  __schedule+0x347/0xd70
[Fri Oct 25 18:26:31 2024]  schedule+0x36/0xc0
[Fri Oct 25 18:26:31 2024]  schedule_preempt_disabled+0x15/0x30
[Fri Oct 25 18:26:31 2024]  __mutex_lock.isra.14+0x431/0x690
[Fri Oct 25 18:26:31 2024]  worker_attach_to_pool+0x1f/0xd0
[Fri Oct 25 18:26:31 2024]  create_worker+0xfa/0x1f0
[Fri Oct 25 18:26:31 2024]  worker_thread+0x19c/0x240
[Fri Oct 25 18:26:31 2024]  ? __pfx_worker_thread+0x10/0x10
[Fri Oct 25 18:26:31 2024]  kthread+0xcf/0x100
[Fri Oct 25 18:26:31 2024]  ? __pfx_kthread+0x10/0x10
[Fri Oct 25 18:26:31 2024]  ret_from_fork+0x31/0x40
[Fri Oct 25 18:26:31 2024]  ? __pfx_kthread+0x10/0x10
[Fri Oct 25 18:26:31 2024]  ret_from_fork_asm+0x1a/0x30
[Fri Oct 25 18:26:31 2024]  </TASK>
[Fri Oct 25 18:26:31 2024] INFO: task kworker/u1379:4:67681 blocked for more than 122 seconds.
[Fri Oct 25 18:26:31 2024]       Tainted: G S         OE      6.12.0-rc4-dis_fgs #1
[Fri Oct 25 18:26:31 2024] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[Fri Oct 25 18:26:31 2024] task:kworker/u1379:4 state:D stack:0     pid:67681 tgid:67681 ppid:2      flags:0x00004000
[Fri Oct 25 18:26:31 2024] Workqueue: events_unbound idle_cull_fn
[Fri Oct 25 18:26:31 2024] Call Trace:
[Fri Oct 25 18:26:31 2024]  <TASK>
[Fri Oct 25 18:26:31 2024]  __schedule+0x347/0xd70
[Fri Oct 25 18:26:31 2024]  schedule+0x36/0xc0
[Fri Oct 25 18:26:31 2024]  schedule_preempt_disabled+0x15/0x30
[Fri Oct 25 18:26:31 2024]  __mutex_lock.isra.14+0x431/0x690
[Fri Oct 25 18:26:31 2024]  ? try_to_wake_up+0x22e/0x690
[Fri Oct 25 18:26:31 2024]  idle_cull_fn+0x3b/0xe0
[Fri Oct 25 18:26:31 2024]  process_scheduled_works+0xa3/0x3e0
[Fri Oct 25 18:26:31 2024]  worker_thread+0x117/0x240
[Fri Oct 25 18:26:31 2024]  ? __pfx_worker_thread+0x10/0x10
[Fri Oct 25 18:26:31 2024]  kthread+0xcf/0x100
[Fri Oct 25 18:26:31 2024]  ? __pfx_kthread+0x10/0x10
[Fri Oct 25 18:26:31 2024]  ret_from_fork+0x31/0x40
[Fri Oct 25 18:26:31 2024]  ? __pfx_kthread+0x10/0x10
[Fri Oct 25 18:26:31 2024]  ret_from_fork_asm+0x1a/0x30
[Fri Oct 25 18:26:31 2024]  </TASK>
[Fri Oct 25 18:26:31 2024] Future hung task reports are suppressed, see sysctl kernel.hung_task_warnings
[oracle@...-1 GNR startup_scripts]$ uname -a
Linux bhs-1 6.12.0-rc4-dis_fgs #1 SMP PREEMPT_DYNAMIC Fri Oct 25 07:26:06 PDT 2024 x86_64 x86_64 x86_64 GNU/Linux

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ