[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-Id: <20260211-wqstall_start-at-v1-0-bd9499a18c19@debian.org>
Date: Wed, 11 Feb 2026 04:29:14 -0800
From: Breno Leitao <leitao@...ian.org>
To: Tejun Heo <tj@...nel.org>, Lai Jiangshan <jiangshanlai@...il.com>,
Andrew Morton <akpm@...ux-foundation.org>
Cc: linux-kernel@...r.kernel.org, Omar Sandoval <osandov@...ndov.com>,
kernel-team@...a.com, Breno Leitao <leitao@...ian.org>
Subject: [PATCH 0/4] workqueue: Detect stalled in-flight workers
The workqueue watchdog detects pools that haven't made forward progress
by checking whether pending work items on the worklist have been waiting
too long. However, this approach has a blind spot: if a pool has only
one work item and that item has already been dequeued and is executing on
a worker, the worklist is empty and the watchdog skips the pool entirely.
This means a single hogged worker with no other pending work is invisible
to the stall detector.
I was able to come up with the following example that shows this blind
spot:
static void stall_work_fn(struct work_struct *work)
{
for (;;) {
mdelay(1000);
cond_resched();
}
}
Additionally, when the watchdog does report stalled pools, the output
doesn't show how long each in-flight work item has been running, making
it harder to identify which specific worker is stuck.
This series addresses both issues:
Patch 1 fixes a minor semantic inconsistency where pool flags were
checked against a workqueue-level constant (WQ_BH instead of POOL_BH).
No behavioral change since both constants have the same value.
Patch 2 renames pool->watchdog_ts to pool->last_progress_ts to better
describe what the timestamp actually tracks.
Patch 3 adds a current_start timestamp to struct worker, recording when
a work item began executing. This is printed in show_pwq() as elapsed
wall-clock time (e.g., "in-flight: 165:stall_work_fn [wq_stall] for
100s"), giving immediate visibility into how long each worker has been
busy.
Patch 4 introduces pool_has_stalled_worker(), which scans all workers in
a pool's busy_hash for any whose current_start timestamp exceeds the
watchdog threshold. This is called unconditionally for every pool,
independent of worklist state, so a stuck worker is always detected. The
feature is gated behind a new CONFIG_WQ_WATCHDOG_WORKERS option
(disabled by default) under CONFIG_WQ_WATCHDOG.
An option is to get rid of CONFIG_WQ_WATCHDOG_WORKERS completely. I've
been running this change on some hosts with workloads (mainly stress-ng)
and I haven't found any false positive.
With this series applied, we will be able to see a stall like the one
above:
BUG: workqueue lockup - worker365:stall_work_fn [wq_stall] stuck in pool cpus=9 node=0 flags=0x0 nice=0 for 2570s!
Showing busy workqueues and worker pools:
workqueue events: flags=0x100
pwq 38: cpus=9 node=0 flags=0x0 nice=0 active=2 refcnt=3
workqueue stall_wq: flags=0x0
---
Breno Leitao (4):
workqueue: Use POOL_BH instead of WQ_BH when checking pool flags
workqueue: Rename pool->watchdog_ts to pool->last_progress_ts
workqueue: Show in-flight work item duration in stall diagnostics
workqueue: Detect stalled in-flight work items with empty worklist
kernel/workqueue.c | 71 ++++++++++++++++++++++++++++++++++++++-------
kernel/workqueue_internal.h | 1 +
lib/Kconfig.debug | 12 ++++++++
3 files changed, 74 insertions(+), 10 deletions(-)
---
base-commit: 9cb8b0f289560728dbb8b88158e7a957e2e90a14
change-id: 20260210-wqstall_start-at-e7319a005ab4
Best regards,
--
Breno Leitao <leitao@...ian.org>
Powered by blists - more mailing lists