[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20250828060620.GB35@bytedance>
Date: Thu, 28 Aug 2025 14:06:20 +0800
From: Aaron Lu <ziqianlu@...edance.com>
To: Michal Koutný <mkoutny@...e.com>
Cc: Valentin Schneider <vschneid@...hat.com>,
Ben Segall <bsegall@...gle.com>,
K Prateek Nayak <kprateek.nayak@....com>,
Peter Zijlstra <peterz@...radead.org>,
Chengming Zhou <chengming.zhou@...ux.dev>,
Josh Don <joshdon@...gle.com>, Ingo Molnar <mingo@...hat.com>,
Vincent Guittot <vincent.guittot@...aro.org>,
Xi Wang <xii@...gle.com>, linux-kernel@...r.kernel.org,
Juri Lelli <juri.lelli@...hat.com>,
Dietmar Eggemann <dietmar.eggemann@....com>,
Steven Rostedt <rostedt@...dmis.org>, Mel Gorman <mgorman@...e.de>,
Chuyi Zhou <zhouchuyi@...edance.com>,
Jan Kiszka <jan.kiszka@...mens.com>,
Florian Bezdeka <florian.bezdeka@...mens.com>,
Songtang Liu <liusongtang@...edance.com>, Tejun Heo <tj@...nel.org>
Subject: Re: [PATCH v3 4/5] sched/fair: Task based throttle time accounting
Hi Michal,
Thanks for taking a look.
On Tue, Aug 26, 2025 at 04:10:37PM +0200, Michal Koutný wrote:
> Hello.
>
> On Tue, Aug 19, 2025 at 05:34:27PM +0800, Aaron Lu <ziqianlu@...edance.com> wrote:
> > Got it, does the below added words make this clear?
> >
> > With task based throttle model, the previous way to check cfs_rq's
> > nr_queued to decide if throttled time should be accounted doesn't work
> > as expected, e.g. when a cfs_rq which has a single task is throttled,
> > that task could later block in kernel mode instead of being dequeued on
> > limbo list and account this as throttled time is not accurate.
> >
> > Rework throttle time accounting for a cfs_rq as follows:
> > - start accounting when the first task gets throttled in its hierarchy;
> > - stop accounting on unthrottle.
> >
> > Note that there will be a time gap between when a cfs_rq is throttled
> > and when a task in its hierarchy is actually throttled. This accounting
> > mechanism only started accounting in the latter case.
>
> Do I understand it correctly that this rework doesn't change the
> cumulative amount of throttled_time in cpu.stat.local but the value gets
> updated only later?
>
> I'd say such little shifts are OK [1]. What should be avoided is
> changing the semantics so that throttled_time time would scale with the
> number of tasks inside the cgroup (assuming a single cfs_rq, i.e. number
> of tasks on the cfs_rq).
As Valetin explained, throttle_time does not scale with the number of
tasks inside the cgroup.
> [1] Maybe not even shifts -- in that case of a cfs_rq with a task, it
> can manage to run in kernel almost for the whole period, so it gets
> dequeued on return to userspace only to be re-enqueued when its cfs_rq
> is unthrottled. It apparently escaped throttling, so the reported
> throttled_time would be rightfully lower.
Right, in this case, the throttle_time would be very small.
Powered by blists - more mailing lists