linux-kernel - Re: [PATCH] Revert "sched/core: Tweak wait_task_inactive() to force dequeue sched

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAENh_SRj9pMyMLZAM0WVr3tuD5ogMQySzkPoiHu4SRoGFkmnZw@mail.gmail.com>
Date: Fri, 26 Sep 2025 16:34:28 +0100
From: Matt Fleming <matt@...dmodwrite.com>
To: K Prateek Nayak <kprateek.nayak@....com>
Cc: John Stultz <jstultz@...gle.com>, Ingo Molnar <mingo@...hat.com>, 
	Peter Zijlstra <peterz@...radead.org>, Juri Lelli <juri.lelli@...hat.com>, 
	Vincent Guittot <vincent.guittot@...aro.org>, Dietmar Eggemann <dietmar.eggemann@....com>, 
	Steven Rostedt <rostedt@...dmis.org>, Ben Segall <bsegall@...gle.com>, Mel Gorman <mgorman@...e.de>, 
	Valentin Schneider <vschneid@...hat.com>, linux-kernel@...r.kernel.org, 
	kernel-team@...udflare.com, Matt Fleming <mfleming@...udflare.com>, 
	Oleg Nesterov <oleg@...hat.com>, Chris Arges <carges@...udflare.com>, stable@...r.kernel.org
Subject: Re: [PATCH] Revert "sched/core: Tweak wait_task_inactive() to force
 dequeue sched_delayed tasks"

On Fri, Sep 26, 2025 at 3:43 AM K Prateek Nayak <kprateek.nayak@....com> wrote:
>
> Hello John, Matt,
>
> On 9/26/2025 5:35 AM, John Stultz wrote:
> >
> > However, there are two spots where we might exit dequeue_entities()
> > early when cfs_rq_throttled(rq), so maybe that's what's catching us
> > here?
>
> That could very likely be it.

That tracks -- we're heavy users of cgroups and this particular issue
only appeared on our kubernetes nodes.

> Matt, if possible can you try the patch attached below to check if the
> bailout for throttled hierarchy is indeed the root cause. Thanks in
> advance.

I've been running our reproducer with this patch for the last few
hours without any issues, so the fix looks good to me.

Tested-by: Matt Fleming <mfleming@...udflare.com>