[<prev] [next>] [day] [month] [year] [list]
Message-ID: <b7e7670b-c665-4938-aa38-5813e8e85b00@linux.dev>
Date: Thu, 14 Aug 2025 12:30:44 +0800
From: Lance Yang <lance.yang@...ux.dev>
To: "Nanji Parmar (he/him)" <nparmar@...estorage.com>
Cc: mhiramat@...nel.org, linux-kernel@...r.kernel.org,
Andrew Morton <akpm@...ux-foundation.org>
Subject: Re: [PATCH] hung_task: Skip hung task detection during core dump
operations
On 2025/8/14 11:31, Nanji Parmar (he/him) wrote:
> Hi Lance, Andrew,
>
> Thanks for looking into this.
> After checking further, we found that the following patch fixed that
> issue. Thank you once again.
Ah, I see. That's why I couldn't reproduce it on 6.16 kernel — the
fix was already there ;)
Thanks for digging this up!
Lance
>
> commit b8e753128ed074fcb48e9ceded940752f6b1c19f
> Author: Paul E. McKenney <paulmck@...nel.org <mailto:paulmck@...nel.org>>
> Date: Wed Jul 24 17:51:52 2024
>
> exit: Sleep at TASK_IDLE when waiting for application core dump
>
> Currently, the coredump_task_exit() function sets the task state
> to TASK_UNINTERRUPTIBLE|TASK_FREEZABLE, which usually works well.
> But a combination of large memory and slow (and/or highly contended)
> mass storage can cause application core dumps to take more than
> two minutes, which can cause check_hung_task(), which is invoked by
> check_hung_uninterruptible_tasks(), to produce task-blocked splats.
> There does not seem to be any reasonable benefit to getting these
> splats.
>
> Furthermore, as Oleg Nesterov points out, TASK_UNINTERRUPTIBLE could
> be misleading because the task sleeping in coredump_task_exit() really
> is killable, albeit indirectly. See the check of signal->core_state
> in prepare_signal() and the check of fatal_signal_pending()
> in dump_interrupted(), which bypass the normal unkillability of
> TASK_UNINTERRUPTIBLE, resulting in coredump_finish() invoking
> wake_up_process() on any threads sleeping in coredump_task_exit().
>
> Therefore, change that TASK_UNINTERRUPTIBLE to TASK_IDLE.
>
> Reported-by: Anhad Jai Singh <ffledgling@...a.com
> <mailto:ffledgling@...a.com>>
> Signed-off-by: Paul E. McKenney <paulmck@...nel.org
> <mailto:paulmck@...nel.org>>
> Acked-by: Oleg Nesterov <oleg@...hat.com <mailto:oleg@...hat.com>>
> Cc: Jens Axboe <axboe@...nel.dk <mailto:axboe@...nel.dk>>
> Cc: Christian Brauner <brauner@...nel.org <mailto:brauner@...nel.org>>
> Cc: Andrew Morton <akpm@...ux-foundation.org <mailto:akpm@...ux-
> foundation.org>>
> Cc: "Matthew Wilcox (Oracle)" <willy@...radead.org
> <mailto:willy@...radead.org>>
> Cc: Chris Mason <clm@...com <mailto:clm@...com>>
> Cc: Rik van Riel <riel@...riel.com <mailto:riel@...riel.com>>
>
> diff --git a/kernel/exit.c b/kernel/exit.c
> index 7430852a8571..0d62a53605df 100644
> --- a/kernel/exit.c
> +++ b/kernel/exit.c
> @@ -428,7 +428,7 @@ static void coredump_task_exit(struct task_struct *tsk)
> complete(&core_state->startup);
>
> for (;;) {
> - set_current_state(TASK_UNINTERRUPTIBLE|
> TASK_FREEZABLE);
> + set_current_state(TASK_IDLE|TASK_FREEZABLE);
> if (!self.task) /* see coredump_finish() */
> break;
> schedule();
>
> Thanks,
> Nanji
>
> On Wed, Aug 13, 2025 at 8:12 PM Lance Yang <lance.yang@...ux.dev
> <mailto:lance.yang@...ux.dev>> wrote:
>
> Hi Nanji,
>
> Thanks for your patch!
>
> On 2025/8/14 06:01, Andrew Morton wrote:
> > On Wed, 13 Aug 2025 11:30:36 -0700 "Nanji Parmar (he/him)"
> <nparmar@...estorage.com <mailto:nparmar@...estorage.com>> wrote:
> >
> >> Tasks involved in core dump operations can legitimately block for
> >> extended periods, especially for large memory processes. The hung
> >> task detector should skip tasks with PF_DUMPCORE (main dumping
> >> thread) or PF_POSTCOREDUMP (other threads in the group) flags to
> >> avoid false positive warnings.
> >>
> >> This prevents incorrect hung task reports during legitimate core
> >> dump generation that can take xx minutes for large processes.
> >
> > It isn't pleasing to be putting coredump special cases into the
> core of
> > the hung-task detector. Perhaps the hung task detector should get an
>
> Yeah, adding a special case for coredumps is not a good design ;)
>
> > equivalent to touch_softlockup_watchdog(). I'm surprised it doesn't
> > already have such a thing. Maybe it does and I've forgotten
> where it is.
> >
> > Please provide a full description of the problem, mainly the relevant
> > dmesg output. Please always provide this full description when
> > addressing kernel issues, thanks.
>
> Interestingly, I wasn't able to reproduce the hung task warning on my
> machine with a SSD, even when generating a 100 GiB coredump. The process
> switches between R and D states so fast that it never hits the timeout,
> even with hung_task_timeout_secs set as low as 5s ;)
>
> So it seems this isn't a general problem for all coredumps. It look like
> it only happens on systems with slow I/O, which can cause a process to
> stay in a D-state for a long time.
>
> Anyway, any task *actually* blocked on I/O for that long should be
> flagged;
> that is the hung task detector's job, IMHO.
>
> Thanks,
> Lance
>
>
Powered by blists - more mailing lists