linux-kernel - Re: [PATCH] sched/fair: Start a cfs_rq on throttled hierarchy with PELT clock throttled

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <xm26plb6icip.fsf@google.com>
Date: Wed, 01 Oct 2025 13:37:50 -0700
From: Benjamin Segall <bsegall@...gle.com>
To: Aaron Lu <ziqianlu@...edance.com>
Cc: K Prateek Nayak <kprateek.nayak@....com>,  Ingo Molnar
 <mingo@...hat.com>,  Peter Zijlstra <peterz@...radead.org>,  Juri Lelli
 <juri.lelli@...hat.com>,  Vincent Guittot <vincent.guittot@...aro.org>,
  Matteo Martelli <matteo.martelli@...ethink.co.uk>,
  linux-kernel@...r.kernel.org,  Dietmar Eggemann
 <dietmar.eggemann@....com>,  Steven Rostedt <rostedt@...dmis.org>,  Mel
 Gorman <mgorman@...e.de>,  Valentin Schneider <vschneid@...hat.com>
Subject: Re: [PATCH] sched/fair: Start a cfs_rq on throttled hierarchy with
 PELT clock throttled

Aaron Lu <ziqianlu@...edance.com> writes:

> BTW, I'm thinking in propagate_entity_cfs_rq(), we shouldn't check the
> ancestor cfs_rq's pelt clock throttled status but only the first level
> cfs_rq's, because the purpose is to have the first level cfs_rq to stay
> on the leaf list and all those list_add_leaf_cfs_rq() for its ancestors
> are just to make sure the list is fully connected. I mean something like
> this:
>
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 75c615f5ed640..6a6d9200ab93c 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -13170,6 +13170,7 @@ prio_changed_fair(struct rq *rq, struct task_struct *p, int oldprio)
>  static void propagate_entity_cfs_rq(struct sched_entity *se)
>  {
>  	struct cfs_rq *cfs_rq = cfs_rq_of(se);
> +	bool add = !cfs_rq_pelt_clock_throttled(cfs_rq);
>  
>  	/*
>  	 * If a task gets attached to this cfs_rq and before being queued,
> @@ -13177,7 +13178,7 @@ static void propagate_entity_cfs_rq(struct sched_entity *se)
>  	 * change, make sure this cfs_rq stays on leaf cfs_rq list to have
>  	 * that removed load decayed or it can cause faireness problem.
>  	 */
> -	if (!cfs_rq_pelt_clock_throttled(cfs_rq))
> +	if (add)
>  		list_add_leaf_cfs_rq(cfs_rq);
>  
>  	/* Start to propagate at parent */
> @@ -13188,7 +13189,7 @@ static void propagate_entity_cfs_rq(struct sched_entity *se)
>  
>  		update_load_avg(cfs_rq, se, UPDATE_TG);
>  
> -		if (!cfs_rq_pelt_clock_throttled(cfs_rq))
> +		if (add)
>  			list_add_leaf_cfs_rq(cfs_rq);
>  	}
>
> But this is a different thing and can be taken care of if necessary
> later. Current logic doesn't have a problem, it's just not as clear as
> the above diff to me.

So the question is if we need to call list_add_leaf_cfs_rq in cases
where the immediate cfs_rq clock was stopped but the parents might not
have. It certainly doesn't seem to me like it should be necessary, but
I'm insufficiently clear on the exact details of the leaf_cfs_rq list
to swear to it.