linux-kernel - Re: [PATCH 0/3] Suppress undesirable hung task warnings.

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <dndr5xdp3bweqtwlyixtzajxgkhxbt2qb2fzg6o2wy5msrhzi4@h3klek5hff5i>
Date: Wed, 24 Sep 2025 12:34:28 +0200
From: Jan Kara <jack@...e.cz>
To: Peter Zijlstra <peterz@...radead.org>
Cc: Andrew Morton <akpm@...ux-foundation.org>, 
	Christoph Hellwig <hch@...radead.org>, Julian Sun <sunjunchao@...edance.com>, cgroups@...r.kernel.org, 
	linux-fsdevel@...r.kernel.org, linux-kernel@...r.kernel.org, viro@...iv.linux.org.uk, 
	brauner@...nel.org, jack@...e.cz, mingo@...hat.com, juri.lelli@...hat.com, 
	vincent.guittot@...aro.org, dietmar.eggemann@....com, rostedt@...dmis.org, 
	bsegall@...gle.com, mgorman@...e.de, vschneid@...hat.com, lance.yang@...ux.dev, 
	mhiramat@...nel.org, agruenba@...hat.com, hannes@...xchg.org, mhocko@...nel.org, 
	roman.gushchin@...ux.dev, shakeel.butt@...ux.dev, muchun.song@...ux.dev
Subject: Re: [PATCH 0/3] Suppress undesirable hung task warnings.

On Tue 23-09-25 09:16:07, Peter Zijlstra wrote:
> On Mon, Sep 22, 2025 at 02:50:45PM -0700, Andrew Morton wrote:
> > On Mon, 22 Sep 2025 11:08:32 -0700 Christoph Hellwig <hch@...radead.org> wrote:
> > 
> > > On Mon, Sep 22, 2025 at 03:27:18PM +0200, Peter Zijlstra wrote:
> > > > > Julian Sun (3):
> > > > >   sched: Introduce a new flag PF_DONT_HUNG.
> > > > >   writeback: Introduce wb_wait_for_completion_no_hung().
> > > > >   memcg: Don't trigger hung task when memcg is releasing.
> > > > 
> > > > This is all quite terrible. I'm not at all sure why a task that is
> > > > genuinely not making progress and isn't killable should not be reported.
> > > 
> > > The hung device detector is way to aggressive for very slow I/O.
> > > See blk_wait_io, which has been around for a long time to work
> > > around just that.  Given that this series targets writeback I suspect
> > > it is about an overloaded device as well.
> > 
> > Yup, it's writeback - the bug report is in
> > https://lkml.kernel.org/r/20250917212959.355656-1-sunjunchao@bytedance.com
> > 
> > Memory is big and storage is slow, there's nothing wrong if a task
> > which is designed to wait for writeback waits for a long time.
> > 
> > Of course, there's something wrong if some other task which isn't
> > designed to wait for writeback gets stuck waiting for the task which
> > *is* designed to wait for writeback, but we'll still warn about that.
> > 
> > 
> > Regarding an implementation, I'm wondering if we can put a flag in
> > `struct completion' telling the hung task detector that this one is
> > expected to wait for long periods sometimes.  Probably messy and it
> > only works for completions (not semaphores, mutexes, etc).  Just
> > putting it out there ;)
> 
> So the problem is that there *is* progress (albeit rather slowly), the
> watchdog just doesn't see that. Perhaps that is the thing we should look
> at fixing.
> 
> How about something like the below? That will 'spuriously' wake up the
> waiters as long as there is some progress being made. Thereby increasing
> the context switch counters of the tasks and thus the hung_task watchdog
> sees progress.
> 
> This approach should be safer than the blk_wait_io() hack, which has a
> timer ticking, regardless of actual completions happening or not.

I like the idea. The problem with your patch is that the progress is not
visible with high enough granularity in wb_writeback_work->done completion.
That is only incremented by 1, when say a request to writeout 1GB is queued
and decremented by 1 when that 1GB is written. The progress can be observed
with higher granularity by wb_writeback_work->nr_pages getting decremented
as we submit pages for writeback but this counter still gets updated only
once we are done with a particular inode so if all those 1GB of data are in
one inode there wouldn't be much to observe. So we might need to observe
how struct writeback_control member nr_to_write gets updated. That is
really updated frequently on IO submission but each filesystem updates it
in their writepages() function so implementing that gets messy pretty
quickly.

But maybe a good place to hook into for registering progress would be
wbc_init_bio()? Filesystems call that whenever we create new bio for writeback
purposes. We do have struct writeback_control available there so through
that we could propagate information that forward progress is being made.

What do people think?

								Honza

> ---
> 
> diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
> index a07b8cf73ae2..1326193b4d95 100644
> --- a/fs/fs-writeback.c
> +++ b/fs/fs-writeback.c
> @@ -174,9 +174,10 @@ static void finish_writeback_work(struct wb_writeback_work *work)
>  		kfree(work);
>  	if (done) {
>  		wait_queue_head_t *waitq = done->waitq;
> +		bool force_wake = (jiffies - done->stamp) > HZ/2;
>  
>  		/* @done can't be accessed after the following dec */
> -		if (atomic_dec_and_test(&done->cnt))
> +		if (atomic_dec_and_test(&done->cnt) || force_wake)
>  			wake_up_all(waitq);
>  	}
>  }
> @@ -213,7 +214,7 @@ static void wb_queue_work(struct bdi_writeback *wb,
>  void wb_wait_for_completion(struct wb_completion *done)
>  {
>  	atomic_dec(&done->cnt);		/* put down the initial count */
> -	wait_event(*done->waitq, !atomic_read(&done->cnt));
> +	wait_event(*done->waitq, ({ done->stamp = jiffies; !atomic_read(&done->cnt); }));
>  }
>  
>  #ifdef CONFIG_CGROUP_WRITEBACK
> diff --git a/include/linux/backing-dev-defs.h b/include/linux/backing-dev-defs.h
> index 2ad261082bba..197593193ce3 100644
> --- a/include/linux/backing-dev-defs.h
> +++ b/include/linux/backing-dev-defs.h
> @@ -63,6 +63,7 @@ enum wb_reason {
>  struct wb_completion {
>  	atomic_t		cnt;
>  	wait_queue_head_t	*waitq;
> +	unsigned long		stamp;
>  };
>  
>  #define __WB_COMPLETION_INIT(_waitq)	\
-- 
Jan Kara <jack@...e.com>
SUSE Labs, CR