linux-kernel - Re: [PATCH 0/4] Reduce scanning of runqueues in select_idle

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAKfTPtCoTD84kWhj5S-2LokcTLanewX8BvjHCN1qucutDOTuzg@mail.gmail.com>
Date:   Fri, 11 Dec 2020 10:51:17 +0100
From:   Vincent Guittot <vincent.guittot@...aro.org>
To:     Mel Gorman <mgorman@...hsingularity.net>
Cc:     Peter Ziljstra <peterz@...radead.org>,
        Ingo Molnar <mingo@...hat.com>,
        LKML <linux-kernel@...r.kernel.org>,
        Aubrey Li <aubrey.li@...ux.intel.com>,
        Barry Song <song.bao.hua@...ilicon.com>,
        Juri Lelli <juri.lelli@...hat.com>,
        Valentin Schneider <valentin.schneider@....com>,
        Linux-ARM <linux-arm-kernel@...ts.infradead.org>
Subject: Re: [PATCH 0/4] Reduce scanning of runqueues in select_idle_sibling

On Thu, 10 Dec 2020 at 12:04, Mel Gorman <mgorman@...hsingularity.net> wrote:
>
> On Thu, Dec 10, 2020 at 10:38:37AM +0100, Vincent Guittot wrote:
> > > while testing your patchset and Aubrey one on top of tip, I'm facing
> > > some perf regression on my arm64 numa system on hackbench and reaim.
> > > The regression seems to comes from your patchset but i don't know
> > > which patch in particular yet
> > >
> > > hackbench -l 256000 -g 1
> > >
> > > v5.10-rc7 + tip/sched/core 13,255(+/- 3.22%)
> > > with your patchset         15.368(+/- 2.74)  -15.9%
> > >
> > > I'm also seeing perf regression on reaim but this one needs more
> > > investigation before confirming
> > >
> > > TBH, I was not expecting regressions. I'm running more test to find
> > > which patch is the culprit
> >
> > The regression comes from patch 3: sched/fair: Do not replace
> > recent_used_cpu with the new target
> >
>
> That's not entirely surprising. The intent of the patch is to increase the
> hit rate of p->recent_used_cpu but it's not a guaranteed win due to two
> corner cases. If multiple tasks have the same p->recent_used_cpu, they can
> race to use that CPU and stack as a result instead of searching the domain.
> If SMT is enabled then p->recent_used_cpu can point to an idle CPU that has
> a busy sibling which the search would have avoided in select_idle_core().
>
> I think you are using processes and sockets for hackbench but as you'll
> see later, hackbench can be used both to show losses and gains.

I run more hackbench tests with pipe and socket and both show
regression with patch 3 whereas this is significant improvement with
other patches and Aubrey's one

>
> I originally tested with 6 machines covering Broadwell (2 socket), Haswell
> (2 socket), Skylake (1 socket), Cascadelake (2 socket), EPYC (2 socket)
> and EPYC 2 (2 socket) with a range of workloads including hackbench. Of
> those, just one reported a major problem with 1 group -- the EPYC 1 machine
>
> EPYC hackbench process-sockets
>                           5.10.0-rc6             5.10.0-rc6
>                        baseline-v2r2         altrecent-v2r5
> Amean     1        1.0607 (   0.00%)      1.1480 (  -8.23%)
> Amean     4        1.3277 (   0.00%)      1.3117 (   1.21%)
> Amean     7        1.6940 (   0.00%)      1.6950 (  -0.06%)
> Amean     12       2.1600 (   0.00%)      2.1367 (   1.08%)
> Amean     21       3.2450 (   0.00%)      3.5883 ( -10.58%)
> Amean     30       4.1673 (   0.00%)      3.9653 (   4.85%)
> Amean     48       4.9257 (   0.00%)      5.0000 (  -1.51%)
> Amean     79       7.4950 (   0.00%)      7.4563 (   0.52%)
> Amean     110     10.4233 (   0.00%)     10.4727 (  -0.47%)
> Amean     141     13.4690 (   0.00%)     13.4563 (   0.09%)
> Amean     172     16.6450 (   0.00%)     16.6033 (   0.25%)
> Amean     203     19.4873 (   0.00%)     19.7893 *  -1.55%*
> Amean     234     22.5507 (   0.00%)     22.8033 (  -1.12%)
> Amean     265     25.3380 (   0.00%)     25.6490 (  -1.23%)
> Amean     296     28.0070 (   0.00%)     28.1270 (  -0.43%)
>
> That's showing an 8% loss for 1 group and also a problem with 21 groups.
> Otherwise, it was more or less flat. EPYC 2 also showed a 2% loss for 1
> group and 9% loss for 21 groups (probably related to the size of the LLC
> domain as there are many LLCs per socket on EPYC*).
>
> For the *same* machine running hackbench using pipes instead of sockets
> we get
>
> EPYC hackbench process-pipes
> Amean     1        0.9497 (   0.00%)      0.9517 (  -0.21%)
> Amean     4        1.2253 (   0.00%)      1.1387 (   7.07%)
> Amean     7        2.0677 (   0.00%)      1.7460 *  15.56%*
> Amean     12       2.8717 (   0.00%)      2.4797 *  13.65%*
> Amean     21       4.4053 (   0.00%)      3.7463 *  14.96%*
> Amean     30       5.3983 (   0.00%)      4.1097 *  23.87%*
> Amean     48       6.1050 (   0.00%)      4.6873 *  23.22%*
> Amean     79       7.5640 (   0.00%)      6.8493 (   9.45%)
> Amean     110     12.2627 (   0.00%)      9.4613 *  22.84%*
> Amean     141     16.9980 (   0.00%)     13.8327 *  18.62%*
> Amean     172     21.5280 (   0.00%)     17.3693 *  19.32%*
> Amean     203     25.4480 (   0.00%)     20.9947 *  17.50%*
> Amean     234     29.6570 (   0.00%)     24.9613 *  15.83%*
> Amean     265     33.0713 (   0.00%)     28.1103 *  15.00%*
> Amean     296     37.4443 (   0.00%)     31.8757 *  14.87%*
>
> So even on the *same hardware*, hackbench can show very different results
> depending on how it is run.
>
> The rest of the machines were more or less neutral for this patch. Once
> hackbench saturates the machine, the hit rate on recent_used_cpu is going
> to be low
>
> 1-socket skylake
> Amean     1       1.3183 (   0.00%)      1.2827 *   2.71%*
> Amean     3       3.6750 (   0.00%)      3.6610 (   0.38%)
> Amean     5       6.1003 (   0.00%)      6.0190 *   1.33%*
> Amean     7       8.6063 (   0.00%)      8.6047 (   0.02%)
> Amean     12     14.9480 (   0.00%)     15.0327 (  -0.57%)
> Amean     18     22.3430 (   0.00%)     22.6680 (  -1.45%)
> Amean     24     29.4970 (   0.00%)     29.6677 (  -0.58%)
> Amean     30     36.7373 (   0.00%)     36.3687 (   1.00%)
> Amean     32     39.0973 (   0.00%)     39.4417 (  -0.88%)
>
> Shows a 2.71% gain for one group, otherwise more or less neutral
>
> 2-socket CascadeLake
>
> Amean     1        0.3663 (   0.00%)      0.3657 (   0.18%)
> Amean     4        0.7510 (   0.00%)      0.7793 (  -3.77%)
> Amean     7        1.2650 (   0.00%)      1.2583 (   0.53%)
> Amean     12       1.9510 (   0.00%)      1.9450 (   0.31%)
> Amean     21       2.9677 (   0.00%)      3.0277 (  -2.02%)
> Amean     30       4.2993 (   0.00%)      4.0237 *   6.41%*
> Amean     48       6.5373 (   0.00%)      6.2987 *   3.65%*
> Amean     79      10.5513 (   0.00%)     10.3280 (   2.12%)
> Amean     110     15.8567 (   0.00%)     13.9817 (  11.82%)
> Amean     141     17.4243 (   0.00%)     17.3177 (   0.61%)
> Amean     172     21.0473 (   0.00%)     20.9760 (   0.34%)
> Amean     203     25.1070 (   0.00%)     25.1150 (  -0.03%)
> Amean     234     28.6753 (   0.00%)     28.9383 (  -0.92%)
> Amean     265     32.7970 (   0.00%)     32.9663 (  -0.52%)
> Amean     296     36.6510 (   0.00%)     36.6753 (  -0.07%)
>
> Neutral for 1 group, small regression for 4 groups, few gains around the
> middle, neutral when over-saturated.
>
> select_idle_sibling is a curse because it's very rare that a change to
> it is a universal win. On balance, I think it's better to avoid searching
> the domain at all where possible even if there are cases where searching
> can have a benefit such as finding an idle core instead of picking an
> idle CPU with a busy sibling via p->recent_used_cpu.
>
> --
> Mel Gorman
> SUSE Labs