linux-kernel - Re: [PATCH v2 for-4.12-fixes 2/2] sched/fair: Fix O(# total cgroups) in load balance path

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20170510144414.GA32165@htj.duckdns.org>
Date:   Wed, 10 May 2017 10:44:14 -0400
From:   Tejun Heo <tj@...nel.org>
To:     Vincent Guittot <vincent.guittot@...aro.org>
Cc:     Ingo Molnar <mingo@...hat.com>,
        Peter Zijlstra <peterz@...radead.org>,
        linux-kernel <linux-kernel@...r.kernel.org>,
        Linus Torvalds <torvalds@...ux-foundation.org>,
        Mike Galbraith <efault@....de>, Paul Turner <pjt@...gle.com>,
        Chris Mason <clm@...com>, kernel-team@...com
Subject: Re: [PATCH v2 for-4.12-fixes 2/2] sched/fair: Fix O(# total cgroups)
 in load balance path

Hello,

On Wed, May 10, 2017 at 08:50:14AM +0200, Vincent Guittot wrote:
> On 9 May 2017 at 18:18, Tejun Heo <tj@...nel.org> wrote:
> > Currently, rq->leaf_cfs_rq_list is a traversal ordered list of all
> > live cfs_rqs which have ever been active on the CPU; unfortunately,
> > this makes update_blocked_averages() O(# total cgroups) which isn't
> > scalable at all.
> 
> Dietmar raised similar optimization in the past. The only question was
> : what is the impact of  re-adding the cfs_rq in leaf_cfs_rq_list on
> the wake up path ? Have you done some measurements ?

Didn't do a perf test yet but it's several more branches and a local
list operation on enqueue, which is already pretty expensive vs. load
balance being O(total number of cgroups on the system).

Anyways, I'll do some hackbench tests with several levels of layering.

> > @@ -7008,6 +7009,14 @@ static void update_blocked_averages(int
> >                 se = cfs_rq->tg->se[cpu];
> >                 if (se && !skip_blocked_update(se))
> >                         update_load_avg(se, 0);
> > +
> > +               /*
> > +                * There can be a lot of idle CPU cgroups.  Don't let fully
> > +                * decayed cfs_rqs linger on the list.
> > +                */
> > +               if (!cfs_rq->load.weight && !cfs_rq->avg.load_sum &&
> > +                   !cfs_rq->avg.util_sum && !cfs_rq->runnable_load_sum)
> > +                       list_del_leaf_cfs_rq(cfs_rq);
> 
> list_add_leaf_cfs_rq() assumes that we always enqueue cfs_rq bottom-up.
> By removing  cfs_rq, can't we break this assumption in some cases ?

We queue a cfs_rq on the leaf list when the a se is queued on that
cfs_rq for the first time, so queueing can happen in any order;
otherwise, we'd simply be doing list_add_tail().  AFAICS, removing and
re-adding shouldn't break anything if the code wasn't broken before.

Thanks.

-- 
tejun