[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CADjb_WTYJoAd0Ok+gxXUQQX0C2uN9-eKkm_tiybqO3hDnYjvFg@mail.gmail.com>
Date: Sat, 9 Apr 2022 23:09:07 +0800
From: Chen Yu <yu.chen.surf@...il.com>
To: Yicong Yang <yangyicong@...wei.com>
Cc: Chen Yu <yu.c.chen@...el.com>, Yicong Yang <yangyccccc@...il.com>,
yangyicong@...ilicon.com,
Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
Tim Chen <tim.c.chen@...el.com>,
Peter Zijlstra <peterz@...radead.org>,
Ingo Molnar <mingo@...hat.com>,
Juri Lelli <juri.lelli@...hat.com>,
Vincent Guittot <vincent.guittot@...aro.org>,
Dietmar Eggemann <dietmar.eggemann@....com>,
Steven Rostedt <rostedt@...dmis.org>,
Mel Gorman <mgorman@...e.de>,
Viresh Kumar <viresh.kumar@...aro.org>,
Barry Song <21cnbao@...il.com>,
Barry Song <song.bao.hua@...ilicon.com>,
Srikar Dronamraju <srikar@...ux.vnet.ibm.com>,
Len Brown <len.brown@...el.com>,
Ben Segall <bsegall@...gle.com>,
Daniel Bristot de Oliveira <bristot@...hat.com>,
Aubrey Li <aubrey.li@...el.com>,
K Prateek Nayak <kprateek.nayak@....com>,
"shenyang (M)" <shenyang39@...wei.com>
Subject: Re: [PATCH v2][RFC] sched/fair: Change SIS_PROP to search idle CPU
based on sum of util_avg
On Tue, Apr 5, 2022 at 9:05 AM Yicong Yang <yangyicong@...wei.com> wrote:
>
> FYI, shenyang has done some investigation on whether we can get an idle cpu if the nr is 4.
> For netperf running on node 0-1 (32 cores on each node) with 32, 64, 128 threads, the success
> rate of findindg an idle cpu is about 61.8%, 7.4%, <0.1%, the CPU utilization is 70.7%, 87.4%
> and 99.9% respectively.
>
Thanks for this testing. So this indicates that nr = 4 would not
improve the idle CPU search efficiency
much when the load is extremely high. Stop searching entirely when it
is nearly 100% may be
more appropriate.
> I have test this patch based on 5.17-rc7 on Kunpeng 920. The benchmarks are binding to node 0
> or node 0-1. The tbench result has some oscillation so I need to have a further check.
> For netperf I see performance enhancement when the threads equals to the cpu number.
>
The benefit might come from returning previous CPU earlier when
nr_threads equals to nr_cpu.
And when the threads number exceeds that of CPU, it might have already
returned previous CPU
without this patch, so we didn't see much improvements(in Shenyang's
test, the success rate is only
7.4% when threads number equals to CPU number)
> For netperf:
> TCP_RR 2 nodes
> threads base patched pct
> 16 50335.56667 49970.63333 -0.73%
> 32 47281.53333 48191.93333 1.93%
> 64 18907.7 34263.63333 81.22%
> 128 14391.1 14480.8 0.62%
> 256 6905.286667 6853.83 -0.75%
>
> TCP_RR 1 node
> threads base patched pct
> 16 50086.06667 49648.13333 -0.87%
> 32 24983.3 39489.43333 58.06%
> 64 18340.03333 18399.56667 0.32%
> 128 7174.713333 7390.09 3.00%
> 256 3433.696667 3404.956667 -0.84%
>
> UDP_RR 2 nodes
> threads base patched pct
> 16 81448.7 82659.43333 1.49%
> 32 75351.13333 76812.36667 1.94%
> 64 25539.46667 41835.96667 63.81%
> 128 25081.56667 23595.56667 -5.92%
> 256 11848.23333 11017.13333 -7.01%
>
> UDP_RR 1 node
> threads base patched pct
> 16 87288.96667 88719.83333 1.64%
> 32 22891.73333 68854.33333 200.78%
> 64 33853.4 35891.6 6.02%
> 128 12108.4 11885.76667 -1.84%
> 256 5620.403333 5531.006667 -1.59%
>
> mysql on node 0-1
> base patched pct
> 16threads-TPS 7100.27 7224.31 1.75%
> 16threads-QPS 142005.45 144486.19 1.75%
> 16threads-avg lat 2.25 2.22 1.63%
> 16threads-99th lat 2.46 2.43 1.08%
> 24threads-TPS 10424.70 10312.20 -1.08%
> 24threads-QPS 208493.86 206243.93 -1.08%
> 24threads-avg lat 2.30 2.32 -0.87%
> 24threads-99th lat 2.52 2.57 -1.85%
> 32threads-TPS 12528.79 12228.88 -2.39%
> 32threads-QPS 250575.92 244577.59 -2.39%
> 32threads-avg lat 2.55 2.61 -2.35%
> 32threads-99th lat 2.88 2.99 -3.82%
> 64threads-TPS 21386.17 21789.99 1.89%
> 64threads-QPS 427723.41 435799.85 1.89%
> 64threads-avg lat 2.99 2.94 1.78%
> 64threads-99th lat 5.00 4.69 6.33%
> 128threads-TPS 20865.13 20781.24 -0.40%
> 128threads-QPS 417302.73 415624.83 -0.40%
> 128threads-avg lat 6.13 6.16 -0.38%
> 128threads-99th lat 8.90 8.95 -0.60%
> 256threads-TPS 19258.15 19295.11 0.19%
> 256threads-QPS 385162.92 385902.27 0.19%
> 256threads-avg lat 13.29 13.26 0.23%
> 256threads-99th lat 20.12 20.12 0.00%
>
> I also had a look on a machine with 2-socket Xeon 6148 (80 threads in total)
> For TCP_RR, the best enhancement also happens when the threads equals to
> the cpu number.
>
May I know if the test is with turbo enabled or disabled? If the turbo
is disabled,
there might be some issues when calculating the util_avg. I had a workaround at
https://lore.kernel.org/all/20220407234258.569681-1-yu.c.chen@intel.com/
And I'm working on the v3 patch which would include above workaround,
will sent it
out later.
--
Thanks,
Chenyu
Powered by blists - more mailing lists