[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <33f995c6-4db7-4e4c-ba12-eb5d05e8521c@linux.dev>
Date: Thu, 14 Aug 2025 11:12:52 +0800
From: Lance Yang <lance.yang@...ux.dev>
To: "Nanji Parmar (he/him)" <nparmar@...estorage.com>
Cc: mhiramat@...nel.org, linux-kernel@...r.kernel.org,
Andrew Morton <akpm@...ux-foundation.org>
Subject: Re: [PATCH] hung_task: Skip hung task detection during core dump
operations
Hi Nanji,
Thanks for your patch!
On 2025/8/14 06:01, Andrew Morton wrote:
> On Wed, 13 Aug 2025 11:30:36 -0700 "Nanji Parmar (he/him)" <nparmar@...estorage.com> wrote:
>
>> Tasks involved in core dump operations can legitimately block for
>> extended periods, especially for large memory processes. The hung
>> task detector should skip tasks with PF_DUMPCORE (main dumping
>> thread) or PF_POSTCOREDUMP (other threads in the group) flags to
>> avoid false positive warnings.
>>
>> This prevents incorrect hung task reports during legitimate core
>> dump generation that can take xx minutes for large processes.
>
> It isn't pleasing to be putting coredump special cases into the core of
> the hung-task detector. Perhaps the hung task detector should get an
Yeah, adding a special case for coredumps is not a good design ;)
> equivalent to touch_softlockup_watchdog(). I'm surprised it doesn't
> already have such a thing. Maybe it does and I've forgotten where it is.
>
> Please provide a full description of the problem, mainly the relevant
> dmesg output. Please always provide this full description when
> addressing kernel issues, thanks.
Interestingly, I wasn't able to reproduce the hung task warning on my
machine with a SSD, even when generating a 100 GiB coredump. The process
switches between R and D states so fast that it never hits the timeout,
even with hung_task_timeout_secs set as low as 5s ;)
So it seems this isn't a general problem for all coredumps. It look like
it only happens on systems with slow I/O, which can cause a process to
stay in a D-state for a long time.
Anyway, any task *actually* blocked on I/O for that long should be flagged;
that is the hung task detector's job, IMHO.
Thanks,
Lance
Powered by blists - more mailing lists