linux-kernel - Re: [PATCH 2/2] sched/fair: Always propagate runnable_load

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAKfTPtC8wS5ATRN1vLa2ERT7oF9MUcBu3=MsiQ5PiP16nYXRgw@mail.gmail.com>
Date:   Wed, 3 May 2017 12:37:37 +0200
From:   Vincent Guittot <vincent.guittot@...aro.org>
To:     Peter Zijlstra <peterz@...radead.org>
Cc:     Tejun Heo <tj@...nel.org>, Ingo Molnar <mingo@...hat.com>,
        linux-kernel <linux-kernel@...r.kernel.org>,
        Linus Torvalds <torvalds@...ux-foundation.org>,
        Mike Galbraith <efault@....de>, Paul Turner <pjt@...gle.com>,
        Chris Mason <clm@...com>, kernel-team@...com
Subject: Re: [PATCH 2/2] sched/fair: Always propagate runnable_load_avg

On 3 May 2017 at 11:37, Peter Zijlstra <peterz@...radead.org> wrote:
>
> On Wed, May 03, 2017 at 09:34:51AM +0200, Vincent Guittot wrote:
>
> > We use load_avg for calculating a stable share and we want to use it
> > more and more.  So breaking it because it's easier doesn't seems to be
> > the right way to do IMHO
>
> So afaict we calculate group se->load.weight (aka shares, see
> calc_cfs_shares), using cfs_rq->avg.load_avg, which feeds into
> tg->load_avg through cfs_rq->tg_load_avg_contrib and
> cfs_rq->load.weight.
>
> And cfs_rq->avg.load_avg is computed from cfs_rq->load.weight, which
> is \Sum se->load.weight.
>
> OTOH group se->avg.load_avg isn't used much, which is TJ's point.
>
> The only cases where group se->avg.load_avg are relevant to
> cfs_rq->avg.load are the cases I listed yesterday, group creation and
> group destruction. There we use the group se->avg.load_avg to manage the
> boundary conditions.
>
> So with the proposed change to se->avg.load_avg we can have some
> (temporary) boundary effect when you destroy a lot of (previously
> active) cgroups.
>
>
> Of course, it could be I overlooked something, in which case, please
> tell :-)

That's mainly based on the regression i see on my platform. I haven't
find the root cause of the regression but it's there which means that
using group_entity's load_avg to propagate child cfs_rq
runnable_load_avg breaks something

>
>
> That said, I agree it would be nice to entirely get rid of runnable_avg,
> but that is a much larger change and would require a lot more work. I
> don't immediately see why we can't fix the thing now and then work on
> removing runnable_load_avg later.

What propose Tejun is to break the group's load_avg and make it
follows child cfs_rq's runnable_load_avg instead of child cfs_rq's
load_avg so it will be difficult if not possible to try to move
load_balance on load_avg and remove runnable_load_avg later on if
load_avg doesn't work anymore as expected. So keeping group's load_avg
working correctly seems a key point
Then, we know that we still have wrong behavior with runnable_load_avg
when running task's load are really different. so it fixes part of
wider problem IMO

>
> Of course, we should not regress either, I'll go read up on that part.
>