[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <20260112111804.3773280-1-matt@readmodwrite.com>
Date: Mon, 12 Jan 2026 11:18:04 +0000
From: Matt Fleming <matt@...dmodwrite.com>
To: Jan Kara <jack@...e.cz>
Cc: cgroups@...r.kernel.org,
linux-kernel@...r.kernel.org,
Tejun Heo <tj@...nel.org>,
Christian Brauner <brauner@...nel.org>,
linux-fsdevel@...r.kernel.org,
kernel-team@...udflare.com
Subject: [REGRESSION] 6.12: Workqueue lockups in inode_switch_wbs_work_fn (suspect commit 66c14dccd810)
Hi Jan, it's me again :)
I’m writing to report a regression we are observing in our production
environment running kernel 6.12. We are seeing severe workqueue lockups that
appear to be triggered by high-volume cgroup destruction. We have isolated the
issue to 66c14dccd810 ("writeback: Avoid softlockup when switching many
inodes").
We're seeing stalled tasks in the inode_switch_wbs workqueue. The worker
appears to be CPU-bound within inode_switch_wbs_work_fn, leading to RCU stalls
and eventual system lockups.
Here is a representative trace from a stalled CPU-bound worker pool:
[1437023.584832][ C0] Showing backtraces of running workers in stalled CPU-bound worker pools:
[1437023.733923][ C0] pool 358:
[1437023.733924][ C0] task:kworker/89:0 state:R running task stack:0 pid:3136989 tgid:3136989 ppid:2 task_flags:0x4208060 flags:0x00004000
[1437023.733929][ C0] Workqueue: inode_switch_wbs inode_switch_wbs_work_fn
[1437023.733933][ C0] Call Trace:
[1437023.733934][ C0] <TASK>
[1437023.733937][ C0] __schedule+0x4fb/0xbf0
[1437023.733942][ C0] __cond_resched+0x33/0x60
[1437023.733944][ C0] inode_switch_wbs_work_fn+0x481/0x710
[1437023.733948][ C0] process_one_work+0x17b/0x330
[1437023.733950][ C0] worker_thread+0x2ce/0x3f0
Our environment makes heavy use of cgroup-based services. When these services
-- specifically our caching layer -- are shut down, they can trigger the
offlining of a massive number of inodes (approx. 200k-250k+ inodes per service).
We have verified that reverting 66c14dccd810 completely eliminates these
lockups in our production environment.
I am currently working on creating a synthetic reproduction case in the lab to
replicate the inode/cgroup density required to trigger this on demand. In the
meantime, I wanted to share these findings to see if you have any insights.
Thanks,
Matt
Powered by blists - more mailing lists