[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <gbzf3rqo6edvan3lzdx7udc3hzfwfjh3c74lm6ruegu5cwg64q@lt4ghwp5y33f>
Date: Thu, 15 Feb 2024 18:26:59 -0500
From: Kent Overstreet <kent.overstreet@...ux.dev>
To: Andrew Morton <akpm@...ux-foundation.org>
Cc: Christoph Hellwig <hch@...radead.org>, linux-kernel@...r.kernel.org,
fuyuanli <fuyuanli@...iglobal.com>
Subject: Re: [PATCH] kernel/hung_task.c: export sysctl_hung_task_timeout_secs
On Thu, Feb 15, 2024 at 10:55:09AM -0800, Andrew Morton wrote:
> On Wed, 14 Feb 2024 21:26:34 -0800 Christoph Hellwig <hch@...radead.org> wrote:
>
> > On Fri, Feb 09, 2024 at 02:09:35AM -0500, Kent Overstreet wrote:
> > > needed for thread_with_file; also rare but not unheard of to need this
> > > in module code, when blocking on user input.
> > >
> > > one workaround used by some code is wait_event_interruptible() - but
> > > that can be buggy if the outer context isn't expecting unwinding.
> >
> > I don't think just exporting the variable ad thus allowing write
> > access is a good idea. If we want to keep going down the route of
> > this hack we should add an accessor function that returns the value.
> >
> > The cleaner solution would be a new task state that explicitly
> > marks code than can sleep forever without triggerring the hang
> > check. Although this might be a bit invaѕive and take a while.
I had the same thought.
> A new PF_whatever flag would solve that simply?
TASK_* flags are separate from PF_* flags, fortunately, and it doesn't
look like anything but TASK_* flags go in task_struct->__state, so
this shouldn't be a difficult change.
> Which are the potential use sites for such a thing?
There's a few places in the block layer that are using the sysctl value;
those will be easy to fix. There's definitely more places abusing
TASK_INTERRUPTIBLE, but aside from the ones in my code I can't think of
a way to search for them.
But the block layer ones look a little suspect to me: the commit message
indicates they were added becasue discards on some devices can take >
100 seconds - which is true, but this is a more general problem, there's
other places we block on IO.
Might want to give this some more thought.
Powered by blists - more mailing lists