[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <9665ff9f-3e1d-4c39-8c8f-b9e12fb4d5f4@linux.dev>
Date: Tue, 23 Sep 2025 10:30:18 +0800
From: Lance Yang <lance.yang@...ux.dev>
To: Julian Sun <sunjunchao@...edance.com>,
Andrew Morton <akpm@...ux-foundation.org>
Cc: mhiramat@...nel.org, viro@...iv.linux.org.uk, brauner@...nel.org,
jack@...e.cz, mingo@...hat.com, peterz@...radead.org, juri.lelli@...hat.com,
vincent.guittot@...aro.org, dietmar.eggemann@....com, rostedt@...dmis.org,
bsegall@...gle.com, mgorman@...e.de, vschneid@...hat.com,
agruenba@...hat.com, hannes@...xchg.org, mhocko@...nel.org,
roman.gushchin@...ux.dev, shakeel.butt@...ux.dev, muchun.song@...ux.dev,
linux-kernel@...r.kernel.org, cgroups@...r.kernel.org,
linux-fsdevel@...r.kernel.org
Subject: Re: [PATCH 0/3] Suppress undesirable hung task warnings.
On 2025/9/23 05:57, Andrew Morton wrote:
> On Mon, 22 Sep 2025 19:38:21 +0800 Lance Yang <lance.yang@...ux.dev> wrote:
>
>> On 2025/9/22 17:41, Julian Sun wrote:
>>> As suggested by Andrew Morton in [1], we need a general mechanism
>>> that allows the hung task detector to ignore unnecessary hung
>>
>> Yep, I understand the goal is to suppress what can be a benign hung task
>> warning during memcg teardown.
>>
>>> tasks. This patch set implements this functionality.
>>>
>>> Patch 1 introduces a PF_DONT_HUNG flag. The hung task detector will
>>> ignores all tasks that have the PF_DONT_HUNG flag set.
>>
>> However, I'm concerned that the PF_DONT_HUNG flag is a bit too powerful
>> and might mask real, underlying hangs.
>
> I think that's OK if the calling task is discriminating about it. Just
> set PF_DONT_HUNG (unpleasing name!) around those bits of code where
> it's needed, clear it otherwise.
Makes sense to me :)
>
> Julian, did you take a look at what a touch_hung_task_detector() would
> involve? It's a bit of an interface inconsistency - our various other
> timeout detectors (softlockup, NMI, rcu) each have a touch_ function.
On second thought, I agree that a touch_hung_task_detector() would be a
much better approach for interface consistency.
We could implement touch_hung_task_detector() to grant the task temporary
immunity from hung task checks for as long as it remains uninterruptible.
Once the task becomes runnable again, the immunity is automatically revoked.
Something like this:
---
diff --git a/include/linux/hung_task.h b/include/linux/hung_task.h
index c4403eeb7144..fac92039dce0 100644
--- a/include/linux/hung_task.h
+++ b/include/linux/hung_task.h
@@ -98,4 +98,9 @@ static inline void *hung_task_blocker_to_lock(unsigned
long blocker)
}
#endif
+void touch_hung_task_detector(struct task_struct *t)
+{
+ t->last_switch_count = ULONG_MAX;
+}
+
#endif /* __LINUX_HUNG_TASK_H */
diff --git a/kernel/hung_task.c b/kernel/hung_task.c
index 8708a1205f82..094a277b3b39 100644
--- a/kernel/hung_task.c
+++ b/kernel/hung_task.c
@@ -203,6 +203,9 @@ static void check_hung_task(struct task_struct *t,
unsigned long timeout)
if (unlikely(!switch_count))
return;
+ if (t->last_switch_count == ULONG_MAX)
+ return;
+
if (switch_count != t->last_switch_count) {
t->last_switch_count = switch_count;
t->last_switch_time = jiffies;
@@ -317,6 +320,9 @@ static void
check_hung_uninterruptible_tasks(unsigned long timeout)
!(state & TASK_WAKEKILL) &&
!(state & TASK_NOLOAD))
check_hung_task(t, timeout);
+ else if (t->last_switch_count == ULONG_MAX)
+ t->last_switch_count = t->nvcsw + t->nivcsw;
+
}
unlock:
rcu_read_unlock();
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 8dc470aa6c3c..3d5f36455b74 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -3910,8 +3910,10 @@ static void mem_cgroup_css_free(struct
cgroup_subsys_state *css)
int __maybe_unused i;
#ifdef CONFIG_CGROUP_WRITEBACK
- for (i = 0; i < MEMCG_CGWB_FRN_CNT; i++)
+ for (i = 0; i < MEMCG_CGWB_FRN_CNT; i++) {
+ touch_hung_task_detector(current);
wb_wait_for_completion(&memcg->cgwb_frn[i].done);
+ }
#endif
if (cgroup_subsys_on_dfl(memory_cgrp_subsys) && !cgroup_memory_nosocket)
static_branch_dec(&memcg_sockets_enabled_key);
---
Using ULONG_MAX as a marker to grant this immunity. As long as the task
remains in state D, check_hung_task() sees the marker and bails out.
Powered by blists - more mailing lists