lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAHSKhtee3amv12XdBu0Wbfde_27pSm7WdRtifGhpfycLwmov0A@mail.gmail.com>
Date: Tue, 23 Sep 2025 10:45:19 +0800
From: Julian Sun <sunjunchao@...edance.com>
To: Lance Yang <lance.yang@...ux.dev>
Cc: Andrew Morton <akpm@...ux-foundation.org>, mhiramat@...nel.org, viro@...iv.linux.org.uk, 
	brauner@...nel.org, jack@...e.cz, mingo@...hat.com, peterz@...radead.org, 
	juri.lelli@...hat.com, vincent.guittot@...aro.org, dietmar.eggemann@....com, 
	rostedt@...dmis.org, bsegall@...gle.com, mgorman@...e.de, vschneid@...hat.com, 
	agruenba@...hat.com, hannes@...xchg.org, mhocko@...nel.org, 
	roman.gushchin@...ux.dev, shakeel.butt@...ux.dev, muchun.song@...ux.dev, 
	linux-kernel@...r.kernel.org, cgroups@...r.kernel.org, 
	linux-fsdevel@...r.kernel.org
Subject: Re: [External] Re: [PATCH 0/3] Suppress undesirable hung task warnings.

On Tue, Sep 23, 2025 at 10:30 AM Lance Yang <lance.yang@...ux.dev> wrote:
>
>
>
> On 2025/9/23 05:57, Andrew Morton wrote:
> > On Mon, 22 Sep 2025 19:38:21 +0800 Lance Yang <lance.yang@...ux.dev> wrote:
> >
> >> On 2025/9/22 17:41, Julian Sun wrote:
> >>> As suggested by Andrew Morton in [1], we need a general mechanism
> >>> that allows the hung task detector to ignore unnecessary hung
> >>
> >> Yep, I understand the goal is to suppress what can be a benign hung task
> >> warning during memcg teardown.
> >>
> >>> tasks. This patch set implements this functionality.
> >>>
> >>> Patch 1 introduces a PF_DONT_HUNG flag. The hung task detector will
> >>> ignores all tasks that have the PF_DONT_HUNG flag set.
> >>
> >> However, I'm concerned that the PF_DONT_HUNG flag is a bit too powerful
> >> and might mask real, underlying hangs.
> >
> > I think that's OK if the calling task is discriminating about it.  Just
> > set PF_DONT_HUNG (unpleasing name!) around those bits of code where
> > it's needed, clear it otherwise.
>
> Makes sense to me :)
>
> >
> > Julian, did you take a look at what a touch_hung_task_detector() would
> > involve?  It's a bit of an interface inconsistency - our various other
> > timeout detectors (softlockup, NMI, rcu) each have a touch_ function.
>
> On second thought, I agree that a touch_hung_task_detector() would be a
> much better approach for interface consistency.
>
> We could implement touch_hung_task_detector() to grant the task temporary
> immunity from hung task checks for as long as it remains uninterruptible.
> Once the task becomes runnable again, the immunity is automatically revoked.

Yes, this looks much cleaner.  I didn’t think of this specific code
implementation method :)
>
> Something like this:
>
> ---
> diff --git a/include/linux/hung_task.h b/include/linux/hung_task.h
> index c4403eeb7144..fac92039dce0 100644
> --- a/include/linux/hung_task.h
> +++ b/include/linux/hung_task.h
> @@ -98,4 +98,9 @@ static inline void *hung_task_blocker_to_lock(unsigned
> long blocker)
>   }
>   #endif
>
> +void touch_hung_task_detector(struct task_struct *t)
> +{
> +       t->last_switch_count = ULONG_MAX;
> +}
> +
>   #endif /* __LINUX_HUNG_TASK_H */
> diff --git a/kernel/hung_task.c b/kernel/hung_task.c
> index 8708a1205f82..094a277b3b39 100644
> --- a/kernel/hung_task.c
> +++ b/kernel/hung_task.c
> @@ -203,6 +203,9 @@ static void check_hung_task(struct task_struct *t,
> unsigned long timeout)
>         if (unlikely(!switch_count))
>                 return;
>
> +       if (t->last_switch_count == ULONG_MAX)
> +               return;
> +
>         if (switch_count != t->last_switch_count) {
>                 t->last_switch_count = switch_count;
>                 t->last_switch_time = jiffies;
> @@ -317,6 +320,9 @@ static void
> check_hung_uninterruptible_tasks(unsigned long timeout)
>                     !(state & TASK_WAKEKILL) &&
>                     !(state & TASK_NOLOAD))
>                         check_hung_task(t, timeout);
> +               else if (t->last_switch_count == ULONG_MAX)
> +                       t->last_switch_count = t->nvcsw + t->nivcsw;

Maybe we don't need this statement here, the if (switch_count !=
t->last_switch_count) statement inside the check_hung_task() will do
it automatically. Or am I missing something?
> +
>         }
>    unlock:
>         rcu_read_unlock();
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index 8dc470aa6c3c..3d5f36455b74 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -3910,8 +3910,10 @@ static void mem_cgroup_css_free(struct
> cgroup_subsys_state *css)
>         int __maybe_unused i;
>
>   #ifdef CONFIG_CGROUP_WRITEBACK
> -       for (i = 0; i < MEMCG_CGWB_FRN_CNT; i++)
> +       for (i = 0; i < MEMCG_CGWB_FRN_CNT; i++) {
> +               touch_hung_task_detector(current);
>                 wb_wait_for_completion(&memcg->cgwb_frn[i].done);
> +       }
>   #endif
>         if (cgroup_subsys_on_dfl(memory_cgrp_subsys) && !cgroup_memory_nosocket)
>                 static_branch_dec(&memcg_sockets_enabled_key);
> ---
>
> Using ULONG_MAX as a marker to grant this immunity. As long as the task
> remains in state D, check_hung_task() sees the marker and bails out.

Thanks for your review, I will send patch v2 with this approach.


-- 
Julian Sun <sunjunchao@...edance.com>

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ