linux-kernel - Re: [PATCH] hung_task: Skip hung task detection during core dump operations

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [day] [month] [year] [list]
Message-ID: <b7e7670b-c665-4938-aa38-5813e8e85b00@linux.dev>
Date: Thu, 14 Aug 2025 12:30:44 +0800
From: Lance Yang <lance.yang@...ux.dev>
To: "Nanji Parmar (he/him)" <nparmar@...estorage.com>
Cc: mhiramat@...nel.org, linux-kernel@...r.kernel.org,
 Andrew Morton <akpm@...ux-foundation.org>
Subject: Re: [PATCH] hung_task: Skip hung task detection during core dump
 operations



On 2025/8/14 11:31, Nanji Parmar (he/him) wrote:
> Hi Lance, Andrew,
> 
> Thanks for looking into this.
> After checking further, we found that the following patch fixed that 
> issue. Thank you once again.

Ah, I see. That's why I couldn't reproduce it on 6.16 kernel — the
fix was already there ;)

Thanks for digging this up!
Lance


> 
> commit b8e753128ed074fcb48e9ceded940752f6b1c19f
> Author: Paul E. McKenney <paulmck@...nel.org <mailto:paulmck@...nel.org>>
> Date:   Wed Jul 24 17:51:52 2024
> 
>      exit: Sleep at TASK_IDLE when waiting for application core dump
> 
>      Currently, the coredump_task_exit() function sets the task state
>      to TASK_UNINTERRUPTIBLE|TASK_FREEZABLE, which usually works well.
>      But a combination of large memory and slow (and/or highly contended)
>      mass storage can cause application core dumps to take more than
>      two minutes, which can cause check_hung_task(), which is invoked by
>      check_hung_uninterruptible_tasks(), to produce task-blocked splats.
>      There does not seem to be any reasonable benefit to getting these 
> splats.
> 
>      Furthermore, as Oleg Nesterov points out, TASK_UNINTERRUPTIBLE could
>      be misleading because the task sleeping in coredump_task_exit() really
>      is killable, albeit indirectly.  See the check of signal->core_state
>      in prepare_signal() and the check of fatal_signal_pending()
>      in dump_interrupted(), which bypass the normal unkillability of
>      TASK_UNINTERRUPTIBLE, resulting in coredump_finish() invoking
>      wake_up_process() on any threads sleeping in coredump_task_exit().
> 
>      Therefore, change that TASK_UNINTERRUPTIBLE to TASK_IDLE.
> 
>      Reported-by: Anhad Jai Singh <ffledgling@...a.com 
> <mailto:ffledgling@...a.com>>
>      Signed-off-by: Paul E. McKenney <paulmck@...nel.org 
> <mailto:paulmck@...nel.org>>
>      Acked-by: Oleg Nesterov <oleg@...hat.com <mailto:oleg@...hat.com>>
>      Cc: Jens Axboe <axboe@...nel.dk <mailto:axboe@...nel.dk>>
>      Cc: Christian Brauner <brauner@...nel.org <mailto:brauner@...nel.org>>
>      Cc: Andrew Morton <akpm@...ux-foundation.org <mailto:akpm@...ux- 
> foundation.org>>
>      Cc: "Matthew Wilcox (Oracle)" <willy@...radead.org 
> <mailto:willy@...radead.org>>
>      Cc: Chris Mason <clm@...com <mailto:clm@...com>>
>      Cc: Rik van Riel <riel@...riel.com <mailto:riel@...riel.com>>
> 
> diff --git a/kernel/exit.c b/kernel/exit.c
> index 7430852a8571..0d62a53605df 100644
> --- a/kernel/exit.c
> +++ b/kernel/exit.c
> @@ -428,7 +428,7 @@ static void coredump_task_exit(struct task_struct *tsk)
>                          complete(&core_state->startup);
> 
>                  for (;;) {
> -                       set_current_state(TASK_UNINTERRUPTIBLE| 
> TASK_FREEZABLE);
> +                       set_current_state(TASK_IDLE|TASK_FREEZABLE);
>                          if (!self.task) /* see coredump_finish() */
>                                  break;
>                          schedule();
> 
> Thanks,
> Nanji
> 
> On Wed, Aug 13, 2025 at 8:12 PM Lance Yang <lance.yang@...ux.dev 
> <mailto:lance.yang@...ux.dev>> wrote:
> 
>     Hi Nanji,
> 
>     Thanks for your patch!
> 
>     On 2025/8/14 06:01, Andrew Morton wrote:
>      > On Wed, 13 Aug 2025 11:30:36 -0700 "Nanji Parmar (he/him)"
>     <nparmar@...estorage.com <mailto:nparmar@...estorage.com>> wrote:
>      >
>      >> Tasks involved in core dump operations can legitimately block for
>      >> extended periods, especially for large memory processes. The hung
>      >> task detector should skip tasks with PF_DUMPCORE (main dumping
>      >> thread) or PF_POSTCOREDUMP (other threads in the group) flags to
>      >> avoid false positive warnings.
>      >>
>      >> This prevents incorrect hung task reports during legitimate core
>      >> dump generation that can take xx minutes for large processes.
>      >
>      > It isn't pleasing to be putting coredump special cases into the
>     core of
>      > the hung-task detector.  Perhaps the hung task detector should get an
> 
>     Yeah, adding a special case for coredumps is not a good design ;)
> 
>      > equivalent to touch_softlockup_watchdog().  I'm surprised it doesn't
>      > already have such a thing.  Maybe it does and I've forgotten
>     where it is.
>      >
>      > Please provide a full description of the problem, mainly the relevant
>      > dmesg output.  Please always provide this full description when
>      > addressing kernel issues, thanks.
> 
>     Interestingly, I wasn't able to reproduce the hung task warning on my
>     machine with a SSD, even when generating a 100 GiB coredump. The process
>     switches between R and D states so fast that it never hits the timeout,
>     even with hung_task_timeout_secs set as low as 5s ;)
> 
>     So it seems this isn't a general problem for all coredumps. It look like
>     it only happens on systems with slow I/O, which can cause a process to
>     stay in a D-state for a long time.
> 
>     Anyway, any task *actually* blocked on I/O for that long should be
>     flagged;
>     that is the hung task detector's job, IMHO.
> 
>     Thanks,
>     Lance
> 
>