[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <u2ri72fqvzlyvwxmaez3l6mbgtkvzmg36ylzc4k2qhvjcdiup5@7ogshyljqoot>
Date: Tue, 26 Aug 2025 16:10:37 +0200
From: Michal Koutný <mkoutny@...e.com>
To: Aaron Lu <ziqianlu@...edance.com>
Cc: Valentin Schneider <vschneid@...hat.com>,
Ben Segall <bsegall@...gle.com>, K Prateek Nayak <kprateek.nayak@....com>,
Peter Zijlstra <peterz@...radead.org>, Chengming Zhou <chengming.zhou@...ux.dev>,
Josh Don <joshdon@...gle.com>, Ingo Molnar <mingo@...hat.com>,
Vincent Guittot <vincent.guittot@...aro.org>, Xi Wang <xii@...gle.com>, linux-kernel@...r.kernel.org,
Juri Lelli <juri.lelli@...hat.com>, Dietmar Eggemann <dietmar.eggemann@....com>,
Steven Rostedt <rostedt@...dmis.org>, Mel Gorman <mgorman@...e.de>,
Chuyi Zhou <zhouchuyi@...edance.com>, Jan Kiszka <jan.kiszka@...mens.com>,
Florian Bezdeka <florian.bezdeka@...mens.com>, Songtang Liu <liusongtang@...edance.com>,
Tejun Heo <tj@...nel.org>
Subject: Re: [PATCH v3 4/5] sched/fair: Task based throttle time accounting
Hello.
On Tue, Aug 19, 2025 at 05:34:27PM +0800, Aaron Lu <ziqianlu@...edance.com> wrote:
> Got it, does the below added words make this clear?
>
> With task based throttle model, the previous way to check cfs_rq's
> nr_queued to decide if throttled time should be accounted doesn't work
> as expected, e.g. when a cfs_rq which has a single task is throttled,
> that task could later block in kernel mode instead of being dequeued on
> limbo list and account this as throttled time is not accurate.
>
> Rework throttle time accounting for a cfs_rq as follows:
> - start accounting when the first task gets throttled in its hierarchy;
> - stop accounting on unthrottle.
>
> Note that there will be a time gap between when a cfs_rq is throttled
> and when a task in its hierarchy is actually throttled. This accounting
> mechanism only started accounting in the latter case.
Do I understand it correctly that this rework doesn't change the
cumulative amount of throttled_time in cpu.stat.local but the value gets
updated only later?
I'd say such little shifts are OK [1]. What should be avoided is
changing the semantics so that throttled_time time would scale with the
number of tasks inside the cgroup (assuming a single cfs_rq, i.e. number
of tasks on the cfs_rq).
0.02€,
Michal
[1] Maybe not even shifts -- in that case of a cfs_rq with a task, it
can manage to run in kernel almost for the whole period, so it gets
dequeued on return to userspace only to be re-enqueued when its cfs_rq
is unthrottled. It apparently escaped throttling, so the reported
throttled_time would be rightfully lower.
Download attachment "signature.asc" of type "application/pgp-signature" (266 bytes)
Powered by blists - more mailing lists