[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CABk29Nv=J_ZUnDTkRhwdQop=REr_XDGjJxn_zVy4kBqwx8K57w@mail.gmail.com>
Date: Fri, 13 May 2022 12:23:16 -0700
From: Josh Don <joshdon@...gle.com>
To: Tejun Heo <tj@...nel.org>
Cc: Ingo Molnar <mingo@...hat.com>,
Peter Zijlstra <peterz@...radead.org>,
Juri Lelli <juri.lelli@...hat.com>,
Vincent Guittot <vincent.guittot@...aro.org>,
Dietmar Eggemann <dietmar.eggemann@....com>,
Steven Rostedt <rostedt@...dmis.org>,
Ben Segall <bsegall@...gle.com>, Mel Gorman <mgorman@...e.de>,
Daniel Bristot de Oliveira <bristot@...hat.com>,
Valentin Schneider <vschneid@...hat.com>,
linux-kernel <linux-kernel@...r.kernel.org>,
Cruz Zhao <CruzZhao@...ux.alibaba.com>
Subject: Re: [PATCH] sched/core: add forced idle accounting for cgroups
Thanks Tejun,
On Thu, May 12, 2022 at 7:58 PM Tejun Heo <tj@...nel.org> wrote:
>
> On Thu, May 12, 2022 at 05:54:27PM -0700, Josh Don wrote:
> > 4feee7d1260 previously added per-task forced idle accounting. This patch
> > extends this to also include cgroups.
> >
> > rstat is used for cgroup accounting, except for the root, which uses
> > kcpustat in order to bypass the need for doing an rstat flush when
> > reading root stats.
> >
> > Only cgroup v2 is supported. Similar to the task accounting, the cgroup
> > accounting requires that schedstats is enabled.
>
> We've been collecting scheduler stats in cgroup core so that we always have
> them available whether cpu controller is enabled or not. There's nothing
> actually specific to cpu controller, right? Would it make sense to collect
> the cpu core stats the same way as the rest of scheduler stats?
Yea, that's right, this doesn't require the cpu controller to be
enabled. Are you suggesting to add a new field to cgroup_base_stat?
One other weird artifact of collecting forceidle time is that a cpu
may account it on behalf of its hyperthread sibling. Currently, the
core rstat code always accounts to the current cpu's percpu rstat
field. I can add an accounting function to support writes to a
different cpu's field, in order to make sure that the per-cpu totals
are correct (the forceidle accounting code holds rq->__lock, which
protects all HT siblings of a core). percpu totals aren't currently
exported in cgroup v2, but this is useful information that we'll
consume, so it would be nice to keep it accurate.
Best,
Josh
Powered by blists - more mailing lists