linux-kernel - Re: [RFC PATCH v2 7/7] sched/fair: alternative way of accounting throttle time

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20250409142414.GA687147@bytedance>
Date: Wed, 9 Apr 2025 22:24:14 +0800
From: Aaron Lu <ziqianlu@...edance.com>
To: Valentin Schneider <vschneid@...hat.com>,
	Ben Segall <bsegall@...gle.com>,
	K Prateek Nayak <kprateek.nayak@....com>,
	Peter Zijlstra <peterz@...radead.org>,
	Josh Don <joshdon@...gle.com>, Ingo Molnar <mingo@...hat.com>,
	Vincent Guittot <vincent.guittot@...aro.org>,
	Xi Wang <xii@...gle.com>
Cc: linux-kernel@...r.kernel.org, Juri Lelli <juri.lelli@...hat.com>,
	Dietmar Eggemann <dietmar.eggemann@....com>,
	Steven Rostedt <rostedt@...dmis.org>, Mel Gorman <mgorman@...e.de>,
	Chengming Zhou <chengming.zhou@...ux.dev>,
	Chuyi Zhou <zhouchuyi@...edance.com>,
	Jan Kiszka <jan.kiszka@...mens.com>
Subject: Re: [RFC PATCH v2 7/7] sched/fair: alternative way of accounting
 throttle time

On Wed, Apr 09, 2025 at 08:07:46PM +0800, Aaron Lu wrote:
> Implement an alternative way of accounting cfs_rq throttle time which:
> - starts accounting when a throttled cfs_rq has no tasks enqueued and its
>   throttled list is not empty;
> - stops accounting when this cfs_rq gets unthrottled or a task gets
>   enqueued.
> 
> This way, the accounted throttle time is when the cfs_rq has absolutely
> no tasks enqueued and has tasks throttled.
> 
> Signed-off-by: Aaron Lu <ziqianlu@...edance.com>
> ---
>  kernel/sched/fair.c  | 112 ++++++++++++++++++++++++++++++++-----------
>  kernel/sched/sched.h |   4 ++
>  2 files changed, 89 insertions(+), 27 deletions(-)
> 
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 20471a3aa35e6..70f7de82d1d9d 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -5300,6 +5300,7 @@ place_entity(struct cfs_rq *cfs_rq, struct sched_entity *se, int flags)
>  
>  static void check_enqueue_throttle(struct cfs_rq *cfs_rq);
>  static inline int cfs_rq_throttled(struct cfs_rq *cfs_rq);
> +static void account_cfs_rq_throttle_self(struct cfs_rq *cfs_rq);
>  
>  static void
>  requeue_delayed_entity(struct sched_entity *se);
> @@ -5362,10 +5363,14 @@ enqueue_entity(struct cfs_rq *cfs_rq, struct sched_entity *se, int flags)
>  		if (throttled_hierarchy(cfs_rq)) {
>  			struct rq *rq = rq_of(cfs_rq);
>  
> -			if (cfs_rq_throttled(cfs_rq) && !cfs_rq->throttled_clock)
> -				cfs_rq->throttled_clock = rq_clock(rq);
> -			if (!cfs_rq->throttled_clock_self)
> -				cfs_rq->throttled_clock_self = rq_clock(rq);
> +			if (cfs_rq->throttled_clock) {
> +				cfs_rq->throttled_time +=
> +					rq_clock(rq) - cfs_rq->throttled_clock;
> +				cfs_rq->throttled_clock = 0;
> +			}

This probably needs more explanation.

We can also take cfs_b->lock and directly accounts the time into
cfs_b->throttled_time, but considering enqueue can be frequent so to
avoid possible lock contention, I chose to account this time to the cpu
local cfs_rq and on unthrottle, add the local accounted time to
cfs_b->throttled_time.

This has a side effect though: when reading cpu.stat and cpu.stat.local
for a task group with quota setting, the throttled_usec in cpu.stat can
be slightly smaller than throttled_usec in cpu.stat.local since some
throttled time is not accounted to cfs_b yet...

> +
> +			if (cfs_rq->throttled_clock_self)
> +				account_cfs_rq_throttle_self(cfs_rq);
>  		}
>  #endif
>  	}