linux-kernel - Re: [PATCH v3 4/5] sched/pelt: Add a new runnable average signal

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAKfTPtAQ_09wVM7zjrHDB+gJXpb5OH6CBvfKC7OB_Bo0Hd41vA@mail.gmail.com>
Date:   Fri, 21 Feb 2020 09:56:16 +0100
From:   Vincent Guittot <vincent.guittot@...aro.org>
To:     Valentin Schneider <valentin.schneider@....com>
Cc:     Ingo Molnar <mingo@...hat.com>,
        Peter Zijlstra <peterz@...radead.org>,
        Juri Lelli <juri.lelli@...hat.com>,
        Dietmar Eggemann <dietmar.eggemann@....com>,
        Steven Rostedt <rostedt@...dmis.org>,
        Ben Segall <bsegall@...gle.com>, Mel Gorman <mgorman@...e.de>,
        linux-kernel <linux-kernel@...r.kernel.org>,
        Phil Auld <pauld@...hat.com>, Parth Shah <parth@...ux.ibm.com>,
        Hillf Danton <hdanton@...a.com>
Subject: Re: [PATCH v3 4/5] sched/pelt: Add a new runnable average signal

On Thu, 20 Feb 2020 at 17:11, Valentin Schneider
<valentin.schneider@....com> wrote:
>
> On 20/02/2020 14:36, Vincent Guittot wrote:
> > I agree that setting by default to SCHED_CAPACITY_SCALE is too much
> > for little core.
> > The problem for little core can be fixed by using the cpu capacity instead
> >
>
> So that's indeed better for big.LITTLE & co. Any reason however for not
> aligning with the initialization of util_avg ?

The runnable_avg is the unweighted version of the load_avg so they
should both be sync at init and SCHED_CAPACITY_SCALE is in fact the
right value. Using cpu_scale is the same for smp and big core so we
can use it instead.

Then, the initial value of util_avg has never reflected some kind of
realistic value for the utilization of a new task, especially if those
tasks will become big ones. Runnable_avg now balances this effect to
say that we don't know what will be the behavior of the new task,
which might end up using all spare capacity although current
utilization is low and CPU is not "fully used". In fact, this is
exactly the purpose of runnable: highlight that there is maybe no
spare capacity even if CPU's utilization is low because of external
event like task migration or having new tasks with most probably wrong
utilization.

That being said, there is a bigger problem with the current version of
this patch, which is that I forgot to use runnable in
update_sg_wakeup_stats(). I have a patch that fixes this problem.

Also, I have tested both proposals with hackbench on my octo cores and
using cpu_scale gives slightly better results than util_avg, which
probably reflects the case I mentioned above.

grp     cpu_scale            util_avg               improvement
1       1,191(+/-0.77 %)     1,204(+/-1.16 %)       -1.07 %
4       1,147(+/-1.14 %)     1,195(+/-0.52 %)       -4.21 %
8       1,112(+/-1,52 %)     1,124(+/-1,45 %)       -1.12 %
16      1,163(+/-1.72 %)     1,169(+/-1.58 %)       -0,45 %

>
> With the default MC imbalance_pct (117), it takes 875 utilization to make
> a single CPU group (with 1024 capacity) overloaded (group_is_overloaded()).
> For a completely idle CPU, that means forking at least 3 tasks (512 + 256 +
> 128 util_avg)
>
> With your change, it only takes 2 tasks. I know I'm being nitpicky here, but
> I feel like those should be aligned, unless we have a proper argument against

I don't see any reason for keeping them aligned

> it - in which case this should also appear in the changelog with so far only
> mentions issues with util_avg migration, not the fork time initialization.
>
> > @@ -796,6 +794,8 @@ void post_init_entity_util_avg(struct task_struct *p)
> >                 }
> >         }
> >
> > +       sa->runnable_avg = cpu_scale;
> > +
> >         if (p->sched_class != &fair_sched_class) {
> >                 /*
> >                  * For !fair tasks do:
> >>
> >>>               sa->load_avg = scale_load_down(se->load.weight);
> >>> +     }
> >>>
> >>>       /* when this task enqueue'ed, it will contribute to its cfs_rq's load_avg */
> >>>  }