linux-kernel - Re: [External] Re: [PATCH 0/3] Suppress undesirable hung task warnings.

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <ceqwycvzll3mocwuzsjb77by5ajuqt3zqrwx3gb67pe6idupqz@ktty5g2jaww4>
Date: Thu, 25 Sep 2025 18:30:18 +0200
From: Jan Kara <jack@...e.cz>
To: Julian Sun <sunjunchao@...edance.com>
Cc: Jan Kara <jack@...e.cz>, Peter Zijlstra <peterz@...radead.org>, 
	Andrew Morton <akpm@...ux-foundation.org>, Christoph Hellwig <hch@...radead.org>, cgroups@...r.kernel.org, 
	linux-fsdevel@...r.kernel.org, linux-kernel@...r.kernel.org, viro@...iv.linux.org.uk, 
	brauner@...nel.org, mingo@...hat.com, juri.lelli@...hat.com, 
	vincent.guittot@...aro.org, dietmar.eggemann@....com, rostedt@...dmis.org, 
	bsegall@...gle.com, mgorman@...e.de, vschneid@...hat.com, lance.yang@...ux.dev, 
	mhiramat@...nel.org, agruenba@...hat.com, hannes@...xchg.org, mhocko@...nel.org, 
	roman.gushchin@...ux.dev, shakeel.butt@...ux.dev, muchun.song@...ux.dev
Subject: Re: [External] Re: [PATCH 0/3] Suppress undesirable hung task
 warnings.

On Thu 25-09-25 23:07:24, Julian Sun wrote:
> On Wed, Sep 24, 2025 at 6:34 PM Jan Kara <jack@...e.cz> wrote:
> >
> > On Tue 23-09-25 09:16:07, Peter Zijlstra wrote:
> > > On Mon, Sep 22, 2025 at 02:50:45PM -0700, Andrew Morton wrote:
> > > > On Mon, 22 Sep 2025 11:08:32 -0700 Christoph Hellwig <hch@...radead.org> wrote:
> > > >
> > > > > On Mon, Sep 22, 2025 at 03:27:18PM +0200, Peter Zijlstra wrote:
> > > > > > > Julian Sun (3):
> > > > > > >   sched: Introduce a new flag PF_DONT_HUNG.
> > > > > > >   writeback: Introduce wb_wait_for_completion_no_hung().
> > > > > > >   memcg: Don't trigger hung task when memcg is releasing.
> > > > > >
> > > > > > This is all quite terrible. I'm not at all sure why a task that is
> > > > > > genuinely not making progress and isn't killable should not be reported.
> > > > >
> > > > > The hung device detector is way to aggressive for very slow I/O.
> > > > > See blk_wait_io, which has been around for a long time to work
> > > > > around just that.  Given that this series targets writeback I suspect
> > > > > it is about an overloaded device as well.
> > > >
> > > > Yup, it's writeback - the bug report is in
> > > > https://lkml.kernel.org/r/20250917212959.355656-1-sunjunchao@bytedance.com
> > > >
> > > > Memory is big and storage is slow, there's nothing wrong if a task
> > > > which is designed to wait for writeback waits for a long time.
> > > >
> > > > Of course, there's something wrong if some other task which isn't
> > > > designed to wait for writeback gets stuck waiting for the task which
> > > > *is* designed to wait for writeback, but we'll still warn about that.
> > > >
> > > >
> > > > Regarding an implementation, I'm wondering if we can put a flag in
> > > > `struct completion' telling the hung task detector that this one is
> > > > expected to wait for long periods sometimes.  Probably messy and it
> > > > only works for completions (not semaphores, mutexes, etc).  Just
> > > > putting it out there ;)
> > >
> > > So the problem is that there *is* progress (albeit rather slowly), the
> > > watchdog just doesn't see that. Perhaps that is the thing we should look
> > > at fixing.
> > >
> > > How about something like the below? That will 'spuriously' wake up the
> > > waiters as long as there is some progress being made. Thereby increasing
> > > the context switch counters of the tasks and thus the hung_task watchdog
> > > sees progress.
> > >
> > > This approach should be safer than the blk_wait_io() hack, which has a
> > > timer ticking, regardless of actual completions happening or not.
> >
> > I like the idea. The problem with your patch is that the progress is not
> > visible with high enough granularity in wb_writeback_work->done completion.
> > That is only incremented by 1, when say a request to writeout 1GB is queued
> > and decremented by 1 when that 1GB is written. The progress can be observed
> > with higher granularity by wb_writeback_work->nr_pages getting decremented
> > as we submit pages for writeback but this counter still gets updated only
> > once we are done with a particular inode so if all those 1GB of data are in
> > one inode there wouldn't be much to observe. So we might need to observe
> > how struct writeback_control member nr_to_write gets updated. That is
> > really updated frequently on IO submission but each filesystem updates it
> > in their writepages() function so implementing that gets messy pretty
> > quickly.
> >
> > But maybe a good place to hook into for registering progress would be
> > wbc_init_bio()? Filesystems call that whenever we create new bio for writeback
> > purposes. We do have struct writeback_control available there so through
> > that we could propagate information that forward progress is being made.
> >
> > What do people think?
> 
> Sorry for the late reply. Yes, Jan, I agree — your proposal sounds
> both fine-grained and elegant. But do we really have a strong need for
> such detailed progress tracking?
> 
> In background writeback, for example, if the bandwidth is very low
> (e.g. avg_write_bandwidth=24), writeback_chunk_size() already splits
> pages into chunks of MIN_WRITEBACK_PAGES (1024). This is usually
> enough to avoid hung task warnings, so reporting progress there might
> be sufficient.

Right.

> I’m also a bit concerned that reporting progress on every
> wbc_init_bio() could lead to excessive wakeups in normal or
> high-throughput cases, which might have side effects. Please correct
> me if I’m missing something.

Hum, fair, we'd have to somehow ratelimit it which adds even more
complexity. If the waking on completion is enough to silence the hung task
detector in your cases, I'm all for a simple solution. We can always
reconsider if it proves to not be good enough.

								Honza
-- 
Jan Kara <jack@...e.com>
SUSE Labs, CR