[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <xyivos2a76rpgmyp6kvvpskmuhheo2wtaqs5s4qvvbn6p3f3lb@3sc7xufujt57>
Date: Tue, 13 Jan 2026 13:02:53 +0100
From: Jan Kara <jack@...e.cz>
To: Matt Fleming <matt@...dmodwrite.com>
Cc: Jan Kara <jack@...e.cz>, cgroups@...r.kernel.org,
linux-kernel@...r.kernel.org, Tejun Heo <tj@...nel.org>,
Christian Brauner <brauner@...nel.org>, linux-fsdevel@...r.kernel.org, kernel-team@...udflare.com
Subject: Re: [REGRESSION] 6.12: Workqueue lockups in inode_switch_wbs_work_fn
(suspect commit 66c14dccd810)
On Tue 13-01-26 11:46:35, Matt Fleming wrote:
> On Mon, Jan 12, 2026 at 06:04:50PM +0100, Jan Kara wrote:
> >
> > I agree we are CPU bound in inode_switch_wbs_work_fn() but I don't think we
> > are really hogging the CPU. The backtrace below indicates the worker just
> > got rescheduled in cond_resched() to give other tasks a chance to run. Is
> > the machine dying completely or does it eventually finish the cgroup
> > teardown?
>
> Yeah you're right, the CPU isn't hogged but the interaction with the
> workqueue subsystem leads to the machine choking. I've seen 150+
> instances of inode_switch_wbs_work_fn() queued up in the workqueue
> subsystem:
>
> [1437017.446174][ C0] in-flight: 3139338:inode_switch_wbs_work_fn ,2420392:inode_switch_wbs_work_fn ,2914179:inode_switch_wbs_work_fn
> [1437017.446181][ C0] pending: 11*inode_switch_wbs_work_fn
> [1437017.446185][ C0] pwq 6: cpus=1 node=0 flags=0x2 nice=0 active=23 refcnt=24
> [1437017.446186][ C0] in-flight: 2723771:inode_switch_wbs_work_fn ,1710617:inode_switch_wbs_work_fn ,3228683:inode_switch_wbs_work_fn ,3149692:inode_switch_wbs_work_fn ,3224195:inode_switch_wbs_work_fn
> [1437017.446193][ C0] pending: 18*inode_switch_wbs_work_fn
> [1437017.446195][ C0] pwq 10: cpus=2 node=0 flags=0x2 nice=0 active=17 refcnt=18
> [1437017.446196][ C0] in-flight: 3224135:inode_switch_wbs_work_fn ,3193118:inode_switch_wbs_work_fn ,3224106:inode_switch_wbs_work_fn ,3228725:inode_switch_wbs_work_fn ,3087195:inode_switch_wbs_work_fn ,1853835:inode_switch_wbs_work_fn
> [1437017.446204][ C0] pending: 11*inode_switch_wbs_work_fn
>
> It sometimes finishes the cgroup teardown and sometimes hard locks up.
> When workqueue items aren't completing things get really bad :)
>
> > Well, these changes were introduced because some services are switching
> > over 1m inodes on their exit and they were softlocking up the machine :).
> > So there's some commonality, just something in that setup behaves
> > differently from your setup. Are the inodes clean, dirty, or only with
> > dirty timestamps?
>
> Good question. I don't know but I'll get back to you.
>
> > Also since you mention 6.12 kernel but this series was
> > only merged in 6.18, do you carry full series ending with merge commit
> > 9426414f0d42f?
>
> We always run the latest 6.12 LTS release and it looks like only these
> two commits got backported:
>
> 9a6ebbdbd412 ("writeback: Avoid excessively long inode switching times")
> 66c14dccd810 ("writeback: Avoid softlockup when switching many inodes")
Ah, OK. Then you're missing e1b849cfa6b61f ("writeback: Avoid contention on
wb->list_lock when switching inodes") which might explain why my system
behaves differently from your one because that commit *heavily* reduces
contention on wb->list_lock when switching inodes and also avoids hogging
multiple workers with the switching works when only one of them can proceed
at a time (others are just spinning on the list_lock). So I'd suggest you
backport that commit and try whether it fixes your issues.
Honza
--
Jan Kara <jack@...e.com>
SUSE Labs, CR
Powered by blists - more mailing lists