linux-kernel - Re: [PATCH v3 4/5] sched/fair: Task based throttle time accounting

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <u2ri72fqvzlyvwxmaez3l6mbgtkvzmg36ylzc4k2qhvjcdiup5@7ogshyljqoot>
Date: Tue, 26 Aug 2025 16:10:37 +0200
From: Michal Koutný <mkoutny@...e.com>
To: Aaron Lu <ziqianlu@...edance.com>
Cc: Valentin Schneider <vschneid@...hat.com>, 
	Ben Segall <bsegall@...gle.com>, K Prateek Nayak <kprateek.nayak@....com>, 
	Peter Zijlstra <peterz@...radead.org>, Chengming Zhou <chengming.zhou@...ux.dev>, 
	Josh Don <joshdon@...gle.com>, Ingo Molnar <mingo@...hat.com>, 
	Vincent Guittot <vincent.guittot@...aro.org>, Xi Wang <xii@...gle.com>, linux-kernel@...r.kernel.org, 
	Juri Lelli <juri.lelli@...hat.com>, Dietmar Eggemann <dietmar.eggemann@....com>, 
	Steven Rostedt <rostedt@...dmis.org>, Mel Gorman <mgorman@...e.de>, 
	Chuyi Zhou <zhouchuyi@...edance.com>, Jan Kiszka <jan.kiszka@...mens.com>, 
	Florian Bezdeka <florian.bezdeka@...mens.com>, Songtang Liu <liusongtang@...edance.com>, 
	Tejun Heo <tj@...nel.org>
Subject: Re: [PATCH v3 4/5] sched/fair: Task based throttle time accounting

Hello.

On Tue, Aug 19, 2025 at 05:34:27PM +0800, Aaron Lu <ziqianlu@...edance.com> wrote:
> Got it, does the below added words make this clear?
> 
>     With task based throttle model, the previous way to check cfs_rq's
>     nr_queued to decide if throttled time should be accounted doesn't work
>     as expected, e.g. when a cfs_rq which has a single task is throttled,
>     that task could later block in kernel mode instead of being dequeued on
>     limbo list and account this as throttled time is not accurate.
> 
>     Rework throttle time accounting for a cfs_rq as follows:
>     - start accounting when the first task gets throttled in its hierarchy;
>     - stop accounting on unthrottle.
> 
>     Note that there will be a time gap between when a cfs_rq is throttled
>     and when a task in its hierarchy is actually throttled. This accounting
>     mechanism only started accounting in the latter case.

Do I understand it correctly that this rework doesn't change the
cumulative amount of throttled_time in cpu.stat.local but the value gets
updated only later?

I'd say such little shifts are OK [1]. What should be avoided is
changing the semantics so that throttled_time time would scale with the
number of tasks inside the cgroup (assuming a single cfs_rq, i.e. number
of tasks on the cfs_rq).

0.02€,
Michal

[1] Maybe not even shifts -- in that case of a cfs_rq with a task, it
can manage to run in kernel almost for the whole period, so it gets
dequeued on return to userspace only to be re-enqueued when its cfs_rq
is unthrottled. It apparently escaped throttling, so the reported
throttled_time would be rightfully lower.

Download attachment "signature.asc" of type "application/pgp-signature" (266 bytes)