linux-kernel - Re: [PATCH 2/2] sched/psi: iterate through cgroups directly

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAMgjq7Bem+8g8A_OR26PHhYYx-A7LHHO3tyQNR_tMnaaKNxkug@mail.gmail.com>
Date:   Fri, 10 Feb 2023 00:08:40 +0800
From:   Kairui Song <ryncsn@...il.com>
To:     Johannes Weiner <hannes@...xchg.org>
Cc:     Michal Koutný <mkoutny@...e.com>,
        Suren Baghdasaryan <surenb@...gle.com>,
        Chengming Zhou <zhouchengming@...edance.com>,
        Tejun Heo <tj@...nel.org>, Ingo Molnar <mingo@...hat.com>,
        Peter Zijlstra <peterz@...radead.org>, cgroups@...r.kernel.org,
        linux-kernel@...r.kernel.org
Subject: Re: [PATCH 2/2] sched/psi: iterate through cgroups directly

Johannes Weiner <hannes@...xchg.org> 于2023年2月9日周四 03:20写道：
> On Wed, Feb 08, 2023 at 06:29:56PM +0100, Michal Koutný wrote:
> > On Thu, Feb 09, 2023 at 12:16:54AM +0800, Kairui Song <ryncsn@...il.com> wrote:
> > > Signed-off-by: Kairui Song <kasong@...cent.com>
> > > Signed-off-by: Kairui Song <ryncsn@...il.com>
> >
> > Typo?
> >
> > > -static inline struct psi_group *task_psi_group(struct task_struct *task)
> > > +static inline struct psi_group *psi_iter_first(struct task_struct *task, void **iter)
> > >  {
> > >  #ifdef CONFIG_CGROUPS
> > > -   if (static_branch_likely(&psi_cgroups_enabled))
> > > -           return cgroup_psi(task_dfl_cgroup(task));
> > > +   if (static_branch_likely(&psi_cgroups_enabled)) {
> > > +           struct cgroup *cgroup = task_dfl_cgroup(task);
> > > +
> > > +           *iter = cgroup_parent(cgroup);
> >
> > This seems to skip a cgroup level -- maybe that's the observed
> > performance gain?
>
> Hm, I don't think it does. It sets up *iter to point to the parent for
> the _next() call, but it returns task_dfl_cgroup()->psi. The next call
> does the same: cgroup = *iter, *iter = parent, return cgroup->psi.
>
> It could be a bit more readable to have *iter always point to the
> current cgroup - but no strong preference either way from me:
>
> psi_groups_first(task, iter)
> {
> #ifdef CONFIG_CGROUPS
>         if (static_branch_likely(&psi_cgroups_enabled)) {
>                 struct cgroup *cgroup = task_dfl_cgroup(task);
>
>                 *iter = cgroup;
>                 return cgroup_psi(cgroup);
>         }
> #endif
>         return &psi_system;
> }
>
> psi_groups_next(iter)
> {
> #ifdef CONFIG_CGROUPS
>         if (static_branch_likely(&psi_cgroups_enabled)) {
>                 struct cgroup *cgroup = *iter;
>
>                 if (cgroup) {
>                         *iter = cgroup_parent(cgroup);
>                         return cgroup_psi(cgroup);
>                 }
>         }
>         return NULL;
> #endif
> }
> psi_groups_next(iter)
> {
> #ifdef CONFIG_CGROUPS
>         if (static_branch_likely(&psi_cgroups_enabled)) {
>                 struct cgroup *cgroup = *iter;
>
>                 if (cgroup) {
>                         *iter = cgroup_parent(cgroup);
>                         return cgroup_psi(cgroup);
>                 }
>         }
>         return NULL;
> #endif
> }

It should be like this, right? For psi_groups_next, retrieving cgroup
parent should be done before "if (cgroup)".
+ psi_groups_next(iter)
+ {
+ #ifdef CONFIG_CGROUPS
+         if (static_branch_likely(&psi_cgroups_enabled)) {
+                 struct cgroup *cgroup = *iter;
+
+                 cgroup = cgroup_parent(cgroup);
+                 if (cgroup) {
+                         *iter = cgroup;
+                         return cgroup_psi(cgroup);
+                 }
+         }
+         return NULL;
+ #endif
+ }

Thanks for the suggestion!

I think your style is better indeed.

I tried to re-benchmark the code just in case, and found it seems my
previous benchmark result is not accurate enough now, some results
changed after I did a rebase to latest master, or maybe just 100 times
of perfpipe is not enough to get a stable result.

With a few times of retest, the final conclusion of the commit message
is still valid, will post V2 later just after more test.