linux-kernel - Re: [PATCH v2] sched/fair: Age the average idle time

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAKfTPtC8d37ZrXfDF2jkgg=tDPb1qAvFQQGXHhTf9LLR59hd8Q@mail.gmail.com>
Date:   Thu, 17 Jun 2021 10:30:09 +0200
From:   Vincent Guittot <vincent.guittot@...aro.org>
To:     Mel Gorman <mgorman@...hsingularity.net>
Cc:     Peter Zijlstra <peterz@...radead.org>,
        Ingo Molnar <mingo@...hat.com>,
        Juri Lelli <juri.lelli@...hat.com>,
        Valentin Schneider <valentin.schneider@....com>,
        LKML <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH v2] sched/fair: Age the average idle time

On Thu, 17 Jun 2021 at 09:44, Mel Gorman <mgorman@...hsingularity.net> wrote:
>
> On Wed, Jun 16, 2021 at 05:52:25PM +0200, Vincent Guittot wrote:
> > On Tue, 15 Jun 2021 at 22:43, Peter Zijlstra <peterz@...radead.org> wrote:
> > >
> > > On Tue, Jun 15, 2021 at 12:16:11PM +0100, Mel Gorman wrote:
> > > > From: Peter Zijlstra (Intel) <peterz@...radead.org>
> > > >
> > > > This is a partial forward-port of Peter Ziljstra's work first posted
> > > > at https://lore.kernel.org/lkml/20180530142236.667774973@infradead.org/.
> > >
> > > It's patches 2 and 3 together, right?
> > >
> > > > His Signed-off has been removed because it is modified but will be restored
> > > > if he says it's still ok.
> > >
> > > I suppose the SoB will auto-magically re-appear if I apply it :-)
> > >
> > > > The patch potentially matters when a socket was multiple LLCs as the
> > > > maximum search depth is lower. However, some of the test results were
> > > > suspiciously good (e.g. specjbb2005 gaining 50% on a Zen1 machine) and
> > > > other results were not dramatically different to other mcahines.
> > > >
> > > > Given the nature of the patch, Peter's full series is not being forward
> > > > ported as each part should stand on its own. Preferably they would be
> > > > merged at different times to reduce the risk of false bisections.
> > >
> > > I'm tempted to give it a go.. anyone object?
> >
> > Just finished running some tests on my large arm64 system.
> > Tbench tests are a mixed between small gain and loss
> >
>
> Same for tbench on three x86 machines I reran tests for
>
> https://beta.suse.com/private/mgorman/melt/v5.13-rc5/3-perf-test/sched/sched-avgidle-v1r6/html/network-tbench/bing2/index.html#tbench4
> Small gains and losses, gains at higher client counts where search depth
>         should be reduced
>
> https://beta.suse.com/private/mgorman/melt/v5.13-rc5/3-perf-test/sched/sched-avgidle-v1r6/html/network-tbench/hardy2/index.html#tbench4
> Mostly gains, one counter-example at 4 clients
>
> https://beta.suse.com/private/mgorman/melt/v5.13-rc5/3-perf-test/sched/sched-avgidle-v1r6/html/network-tbench/marvin2/index.html#tbench4
> Worst by far, 1 client took a major hit for unknown reasons, otherwise
>         mix of gains and losses. I'm not confident that the 1 client
>         results are meaningful because for this machine, there should
>         have been idle cores so the code the patch adjusts should not
>         even be executed.
>
> > hackbench shows significant changes in both direction
> > hackbench -g $group
> >
> > group  tip/sched/core      + this patch
> > 1      13.358(+/- 1.82%)   12.850(+/- 2.21%) +4%
> > 4      4.286(+/- 2.77%)    4.114(+/- 2.25%)  +4%
> > 16     3.175(+/- 0.55%)    3.559(+/- 0.43%)  -12%
> > 32     2.912(+/- 0.79%)    3.165(+/- 0.95%)  -8%
> > 64     2.859(+/- 1.12%)    2.937(+/- 0.91%)  -3%
> > 128    3.092(+/- 4.75%)    3.003(+/-5.18%)   +3%
> > 256    3.233(+/- 3.03%)    2.973(+/- 0.80%)  +8%
>
> Think this is processes and sockets. Of the hackbench results I had,
> this one performed the worst
>
> https://beta.suse.com/private/mgorman/melt/v5.13-rc5/3-perf-test/sched/sched-avgidle-v1r6/html/scheduler-unbound/bing2/index.html#hackbench-process-sockets
> Small gains and losses
>
> https://beta.suse.com/private/mgorman/melt/v5.13-rc5/3-perf-test/sched/sched-avgidle-v1r6/html/scheduler-unbound/hardy2/index.html#hackbench-process-sockets
> Small gains and losses
>
> https://beta.suse.com/private/mgorman/melt/v5.13-rc5/3-perf-test/sched/sched-avgidle-v1r6/html/scheduler-unbound/marvin2/index.html#hackbench-process-sockets
> Small gains and losses
>
> One of the better results for hackbench was processes and pipes
> https://beta.suse.com/private/mgorman/melt/v5.13-rc5/3-perf-test/sched/sched-avgidle-v1r6/html/scheduler-unbound/bing2/index.html#hackbench-process-pipes
> 1-12% gains
>
> For your arm machine, how many logical CPUs are online, what is the level
> of SMT if any and is the machine NUMA?

It's a SMT4 x 28 cores x 2 NUMA nodes = 224 CPUs

>
> Fundamentally though, as the changelog notes "due to the nature of the
> patch, this is a regression magnet". There are going to be examples
> where a deep search is better even if a machine is fully busy or
> overloaded and examples where cutting off the search is better. I think
> it's better to have an idle estimate that gets updated if CPUs are fully
> busy even if it's not a universal win.

Although I agree that using a stall average idle time value of local
is not good, I'm not sure this proposal is better. The main problem is
that we use the avg_idle of the local CPU to estimate how many times
we should loop and try to find another idle CPU. But there is no
direct relation between both. Typically, a short average idle time on
the local CPU doesn't mean that there are less idle CPUs and that's
why we have a mix a gain and loss

>
> --
> Mel Gorman
> SUSE Labs