linux-kernel - Re: [PATCH v2][RFC] sched/fair: Change SIS_PROP to search idle CPU based on sum of util

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CADjb_WTYJoAd0Ok+gxXUQQX0C2uN9-eKkm_tiybqO3hDnYjvFg@mail.gmail.com>
Date:   Sat, 9 Apr 2022 23:09:07 +0800
From:   Chen Yu <yu.chen.surf@...il.com>
To:     Yicong Yang <yangyicong@...wei.com>
Cc:     Chen Yu <yu.c.chen@...el.com>, Yicong Yang <yangyccccc@...il.com>,
        yangyicong@...ilicon.com,
        Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
        Tim Chen <tim.c.chen@...el.com>,
        Peter Zijlstra <peterz@...radead.org>,
        Ingo Molnar <mingo@...hat.com>,
        Juri Lelli <juri.lelli@...hat.com>,
        Vincent Guittot <vincent.guittot@...aro.org>,
        Dietmar Eggemann <dietmar.eggemann@....com>,
        Steven Rostedt <rostedt@...dmis.org>,
        Mel Gorman <mgorman@...e.de>,
        Viresh Kumar <viresh.kumar@...aro.org>,
        Barry Song <21cnbao@...il.com>,
        Barry Song <song.bao.hua@...ilicon.com>,
        Srikar Dronamraju <srikar@...ux.vnet.ibm.com>,
        Len Brown <len.brown@...el.com>,
        Ben Segall <bsegall@...gle.com>,
        Daniel Bristot de Oliveira <bristot@...hat.com>,
        Aubrey Li <aubrey.li@...el.com>,
        K Prateek Nayak <kprateek.nayak@....com>,
        "shenyang (M)" <shenyang39@...wei.com>
Subject: Re: [PATCH v2][RFC] sched/fair: Change SIS_PROP to search idle CPU
 based on sum of util_avg

On Tue, Apr 5, 2022 at 9:05 AM Yicong Yang <yangyicong@...wei.com> wrote:
>
> FYI, shenyang has done some investigation on whether we can get an idle cpu if the nr is 4.
> For netperf running on node 0-1 (32 cores on each node) with 32, 64, 128 threads, the success
> rate of findindg an idle cpu is about 61.8%, 7.4%, <0.1%, the CPU utilization is 70.7%, 87.4%
> and 99.9% respectively.
>
Thanks for this testing. So this indicates that nr = 4 would not
improve the idle CPU search efficiency
 much when the load is extremely high. Stop searching entirely when it
is nearly 100%  may be
more appropriate.
> I have test this patch based on 5.17-rc7 on Kunpeng 920. The benchmarks are binding to node 0
> or node 0-1. The tbench result has some oscillation so I need to have a further check.
> For netperf I see performance enhancement when the threads equals to the cpu number.
>
The benefit might come from returning previous CPU earlier when
nr_threads equals to nr_cpu.
And when the threads number exceeds that of CPU, it might have already
returned previous CPU
without this patch, so we didn't see much improvements(in Shenyang's
test, the success rate is only
7.4% when threads number equals to CPU number)
> For netperf:
> TCP_RR 2 nodes
> threads         base            patched         pct
> 16              50335.56667     49970.63333     -0.73%
> 32              47281.53333     48191.93333     1.93%
> 64              18907.7         34263.63333     81.22%
> 128             14391.1         14480.8         0.62%
> 256             6905.286667     6853.83         -0.75%
>
> TCP_RR 1 node
> threads         base            patched         pct
> 16              50086.06667     49648.13333     -0.87%
> 32              24983.3         39489.43333     58.06%
> 64              18340.03333     18399.56667     0.32%
> 128             7174.713333     7390.09         3.00%
> 256             3433.696667     3404.956667     -0.84%
>
> UDP_RR 2 nodes
> threads         base            patched         pct
> 16              81448.7         82659.43333     1.49%
> 32              75351.13333     76812.36667     1.94%
> 64              25539.46667     41835.96667     63.81%
> 128             25081.56667     23595.56667     -5.92%
> 256             11848.23333     11017.13333     -7.01%
>
> UDP_RR 1 node
> threads         base            patched         pct
> 16              87288.96667     88719.83333     1.64%
> 32              22891.73333     68854.33333     200.78%
> 64              33853.4         35891.6         6.02%
> 128             12108.4         11885.76667     -1.84%
> 256             5620.403333     5531.006667     -1.59%
>
> mysql on node 0-1
>                         base            patched         pct
> 16threads-TPS           7100.27         7224.31         1.75%
> 16threads-QPS           142005.45       144486.19       1.75%
> 16threads-avg lat       2.25            2.22            1.63%
> 16threads-99th lat      2.46            2.43            1.08%
> 24threads-TPS           10424.70        10312.20        -1.08%
> 24threads-QPS           208493.86       206243.93       -1.08%
> 24threads-avg lat       2.30            2.32            -0.87%
> 24threads-99th lat      2.52            2.57            -1.85%
> 32threads-TPS           12528.79        12228.88        -2.39%
> 32threads-QPS           250575.92       244577.59       -2.39%
> 32threads-avg lat       2.55            2.61            -2.35%
> 32threads-99th lat      2.88            2.99            -3.82%
> 64threads-TPS           21386.17        21789.99        1.89%
> 64threads-QPS           427723.41       435799.85       1.89%
> 64threads-avg lat       2.99            2.94            1.78%
> 64threads-99th lat      5.00            4.69            6.33%
> 128threads-TPS          20865.13        20781.24        -0.40%
> 128threads-QPS          417302.73       415624.83       -0.40%
> 128threads-avg lat      6.13            6.16            -0.38%
> 128threads-99th lat     8.90            8.95            -0.60%
> 256threads-TPS          19258.15        19295.11        0.19%
> 256threads-QPS          385162.92       385902.27       0.19%
> 256threads-avg lat      13.29           13.26           0.23%
> 256threads-99th lat     20.12           20.12           0.00%
>
> I also had a look on a machine with 2-socket Xeon 6148 (80 threads in total)
> For TCP_RR, the best enhancement also happens when the threads equals to
> the cpu number.
>
May I know if the test is with turbo enabled or disabled? If the turbo
is disabled,
there might be some issues when calculating the util_avg. I had a workaround at
https://lore.kernel.org/all/20220407234258.569681-1-yu.c.chen@intel.com/
And I'm working on the v3 patch which would include above workaround,
will sent it
out later.

-- 
Thanks,
Chenyu