linux-kernel - Re: [PATCH v4 3/5] sched/fair: Switch to task based throttle model

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20250903202703.GP4067720@noisy.programming.kicks-ass.net>
Date: Wed, 3 Sep 2025 22:27:03 +0200
From: Peter Zijlstra <peterz@...radead.org>
To: K Prateek Nayak <kprateek.nayak@....com>
Cc: Aaron Lu <ziqianlu@...edance.com>,
	Valentin Schneider <vschneid@...hat.com>,
	Ben Segall <bsegall@...gle.com>,
	Chengming Zhou <chengming.zhou@...ux.dev>,
	Josh Don <joshdon@...gle.com>, Ingo Molnar <mingo@...hat.com>,
	Vincent Guittot <vincent.guittot@...aro.org>,
	Xi Wang <xii@...gle.com>, linux-kernel@...r.kernel.org,
	Juri Lelli <juri.lelli@...hat.com>,
	Dietmar Eggemann <dietmar.eggemann@....com>,
	Steven Rostedt <rostedt@...dmis.org>, Mel Gorman <mgorman@...e.de>,
	Chuyi Zhou <zhouchuyi@...edance.com>,
	Jan Kiszka <jan.kiszka@...mens.com>,
	Florian Bezdeka <florian.bezdeka@...mens.com>,
	Songtang Liu <liusongtang@...edance.com>,
	Chen Yu <yu.c.chen@...el.com>,
	Matteo Martelli <matteo.martelli@...ethink.co.uk>,
	Michal Koutn?? <mkoutny@...e.com>,
	Sebastian Andrzej Siewior <bigeasy@...utronix.de>
Subject: Re: [PATCH v4 3/5] sched/fair: Switch to task based throttle model

On Wed, Sep 03, 2025 at 10:42:01PM +0530, K Prateek Nayak wrote:
> Hello Peter,
> 
> On 9/3/2025 8:21 PM, Peter Zijlstra wrote:
> >>  static bool dequeue_task_fair(struct rq *rq, struct task_struct *p, int flags)
> >>  {
> >> +	if (task_is_throttled(p)) {
> >> +		dequeue_throttled_task(p, flags);
> >> +		return true;
> >> +	}
> >> +
> >>  	if (!p->se.sched_delayed)
> >>  		util_est_dequeue(&rq->cfs, p);
> >>  
> > 
> > OK, so this makes it so that either a task is fully enqueued (all
> > cfs_rq's) or full not. A group cfs_rq is only marked throttled when all
> > its tasks are gone, and unthrottled when a task gets added. Right?
> 
> cfs_rq (and the hierarchy below) is marked throttled when the quota
> has elapsed. Tasks on the throttled hierarchies will dequeue
> themselves completely via task work added during pick. When the last
> task leaves on a cfs_rq of throttled hierarchy, PELT is frozen for
> that cfs_rq.

Ah, right.

> When a new task is added on the hierarchy, the PELT is unfrozen and
> the task becomes runnable. The cfs_rq and the hierarchy is still
> marked throttled.
> 
> Unthrottling of hierarchy is only done at distribution.
> 
> > 
> > But propagate_entity_cfs_rq() is still doing the old thing, and has a
> > if (cfs_rq_throttled(cfs_rq)) break; inside the for_each_sched_entity()
> > iteration.
> > 
> > This seems somewhat inconsistent; or am I missing something ? 
> 
> Probably an oversight. But before that, what was the reason to have
> stopped this propagation at throttled_cfs_rq() before the changes?
> 
> Looking at commit 09a43ace1f98 ("sched/fair: Propagate load during
> synchronous attach/detach") was it because the update_load_avg() from
> enqueue_entity() loop previously in unthrottle_cfs_rq() would have
> propagated the PELT to root on unthrottle?
> 
> If that was the intention, perhaps something like:

So this is mostly tasks leaving/joining the class/cgroup. And its
purpose seems to be to remove/add the blocked load component.

Previously throttle/unthrottle would {de,en}queue the whole subtree from
PELT, see how {en,de}queue would also stop at throttle.

But now none of that is done; PELT is fully managed by the tasks
{de,en}queueing.

So I'm thinking that when a task joins fair (deboost from RT or
whatever), we add the blocking load and fully propagate it. If the task
is subject to throttling, that will then happen 'naturally' and it will
dequeue itself again.