linux-kernel - Re: [PATCH v2][RFC] sched/fair: Change SIS_PROP to search idle CPU based on sum of util

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <246c377f-7095-2416-d068-2e7a531e843e@huawei.com>
Date:   Mon, 11 Apr 2022 16:04:56 +0800
From:   Yicong Yang <yangyicong@...wei.com>
To:     Chen Yu <yu.chen.surf@...il.com>
CC:     <yangyicong@...ilicon.com>, Chen Yu <yu.c.chen@...el.com>,
        Yicong Yang <yangyccccc@...il.com>,
        Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
        Tim Chen <tim.c.chen@...el.com>,
        Peter Zijlstra <peterz@...radead.org>,
        Ingo Molnar <mingo@...hat.com>,
        Juri Lelli <juri.lelli@...hat.com>,
        Vincent Guittot <vincent.guittot@...aro.org>,
        Dietmar Eggemann <dietmar.eggemann@....com>,
        Steven Rostedt <rostedt@...dmis.org>,
        Mel Gorman <mgorman@...e.de>,
        Viresh Kumar <viresh.kumar@...aro.org>,
        Barry Song <21cnbao@...il.com>,
        Barry Song <song.bao.hua@...ilicon.com>,
        Srikar Dronamraju <srikar@...ux.vnet.ibm.com>,
        Len Brown <len.brown@...el.com>,
        Ben Segall <bsegall@...gle.com>,
        Daniel Bristot de Oliveira <bristot@...hat.com>,
        Aubrey Li <aubrey.li@...el.com>,
        K Prateek Nayak <kprateek.nayak@....com>,
        "shenyang (M)" <shenyang39@...wei.com>
Subject: Re: [PATCH v2][RFC] sched/fair: Change SIS_PROP to search idle CPU
 based on sum of util_avg

On 2022/4/9 23:09, Chen Yu wrote:
> On Tue, Apr 5, 2022 at 9:05 AM Yicong Yang <yangyicong@...wei.com> wrote:
>>
>> FYI, shenyang has done some investigation on whether we can get an idle cpu if the nr is 4.
>> For netperf running on node 0-1 (32 cores on each node) with 32, 64, 128 threads, the success
>> rate of findindg an idle cpu is about 61.8%, 7.4%, <0.1%, the CPU utilization is 70.7%, 87.4%
>> and 99.9% respectively.
>>
> Thanks for this testing. So this indicates that nr = 4 would not
> improve the idle CPU search efficiency
>  much when the load is extremely high. Stop searching entirely when it
> is nearly 100%  may be
> more appropriate.
>> I have test this patch based on 5.17-rc7 on Kunpeng 920. The benchmarks are binding to node 0
>> or node 0-1. The tbench result has some oscillation so I need to have a further check.
>> For netperf I see performance enhancement when the threads equals to the cpu number.
>>
> The benefit might come from returning previous CPU earlier when
> nr_threads equals to nr_cpu.
> And when the threads number exceeds that of CPU, it might have already
> returned previous CPU
> without this patch, so we didn't see much improvements(in Shenyang's
> test, the success rate is only
> 7.4% when threads number equals to CPU number)

yes. I think it maybe the case. When the system is fully loaded the behaviour may stay
the same with previous approach, both are scanning 4 cpus.

>> For netperf:
>> TCP_RR 2 nodes
>> threads         base            patched         pct
>> 16              50335.56667     49970.63333     -0.73%
>> 32              47281.53333     48191.93333     1.93%
>> 64              18907.7         34263.63333     81.22%
>> 128             14391.1         14480.8         0.62%
>> 256             6905.286667     6853.83         -0.75%
>>
>> TCP_RR 1 node
>> threads         base            patched         pct
>> 16              50086.06667     49648.13333     -0.87%
>> 32              24983.3         39489.43333     58.06%
>> 64              18340.03333     18399.56667     0.32%
>> 128             7174.713333     7390.09         3.00%
>> 256             3433.696667     3404.956667     -0.84%
>>
>> UDP_RR 2 nodes
>> threads         base            patched         pct
>> 16              81448.7         82659.43333     1.49%
>> 32              75351.13333     76812.36667     1.94%
>> 64              25539.46667     41835.96667     63.81%
>> 128             25081.56667     23595.56667     -5.92%
>> 256             11848.23333     11017.13333     -7.01%
>>
>> UDP_RR 1 node
>> threads         base            patched         pct
>> 16              87288.96667     88719.83333     1.64%
>> 32              22891.73333     68854.33333     200.78%
>> 64              33853.4         35891.6         6.02%
>> 128             12108.4         11885.76667     -1.84%
>> 256             5620.403333     5531.006667     -1.59%
>>
>> mysql on node 0-1
>>                         base            patched         pct
>> 16threads-TPS           7100.27         7224.31         1.75%
>> 16threads-QPS           142005.45       144486.19       1.75%
>> 16threads-avg lat       2.25            2.22            1.63%
>> 16threads-99th lat      2.46            2.43            1.08%
>> 24threads-TPS           10424.70        10312.20        -1.08%
>> 24threads-QPS           208493.86       206243.93       -1.08%
>> 24threads-avg lat       2.30            2.32            -0.87%
>> 24threads-99th lat      2.52            2.57            -1.85%
>> 32threads-TPS           12528.79        12228.88        -2.39%
>> 32threads-QPS           250575.92       244577.59       -2.39%
>> 32threads-avg lat       2.55            2.61            -2.35%
>> 32threads-99th lat      2.88            2.99            -3.82%
>> 64threads-TPS           21386.17        21789.99        1.89%
>> 64threads-QPS           427723.41       435799.85       1.89%
>> 64threads-avg lat       2.99            2.94            1.78%
>> 64threads-99th lat      5.00            4.69            6.33%
>> 128threads-TPS          20865.13        20781.24        -0.40%
>> 128threads-QPS          417302.73       415624.83       -0.40%
>> 128threads-avg lat      6.13            6.16            -0.38%
>> 128threads-99th lat     8.90            8.95            -0.60%
>> 256threads-TPS          19258.15        19295.11        0.19%
>> 256threads-QPS          385162.92       385902.27       0.19%
>> 256threads-avg lat      13.29           13.26           0.23%
>> 256threads-99th lat     20.12           20.12           0.00%
>>
>> I also had a look on a machine with 2-socket Xeon 6148 (80 threads in total)
>> For TCP_RR, the best enhancement also happens when the threads equals to
>> the cpu number.
>>
> May I know if the test is with turbo enabled or disabled? If the turbo
> is disabled,

p-state is controlled by the platform on my machine. I assume it's on as the Avg_MHz
and Bzy_Mhz varies according to the load. turbostat says:
# cpu 0 idle
Package Core    CPU     Avg_MHz Busy%   Bzy_MHz TSC_MHz IPC     IRQ     SMI     POLL    C1      POLL%   C1%     CPU%c1  CPU%c6  CoreTmp PkgTmp  PkgWatt RAMWatt PKG_%   RAM_%
-       -       -       0       0.01    3100    2400    1.41    1935    0       0       1831    0.00    100.00  99.99   0.00    69      69      150.55  44.41   0.00    0.00
0       0       0       2       0.06    3100    2400    0.99    46      0       0       45      0.00    99.95   99.94   0.00    66      68      74.12   21.71   0.00    0.00
# cpu 0 partly loaded
Package Core    CPU     Avg_MHz Busy%   Bzy_MHz TSC_MHz IPC     IRQ     SMI     POLL    C1      POLL%   C1%     CPU%c1  CPU%c6  CoreTmp PkgTmp  PkgWatt RAMWatt PKG_%   RAM_%
-       -       -       1559    50.34   3097    2400    1.44    56952   0       64      4729    0.00    49.67   49.66   0.00    83      84      291.81  43.79   0.00    0.00
0       0       0       1642    53.01   3098    2400    1.48    1610    0       0       620     0.00    47.51   46.99   0.00    75      78      142.62  21.44   0.00    0.00
# cpu 0 loaded by stress-ng
Package Core    CPU     Avg_MHz Busy%   Bzy_MHz TSC_MHz IPC     IRQ     SMI     POLL    C1      POLL%   C1%     CPU%c1  CPU%c6  CoreTmp PkgTmp  PkgWatt RAMWatt PKG_%   RAM_%
-       -       -       2917    99.52   2931    2400    1.06    101718  0       0       0       0.00    0.00    0.48    0.00    86      86      299.87  43.84   0.00    0.00
0       0       0       2943    99.39   2961    2400    1.05    1278    0       0       0       0.00    0.00    0.61    0.00    81      83      149.90  21.38   0.00    0.00


> there might be some issues when calculating the util_avg. I had a workaround at
> https://lore.kernel.org/all/20220407234258.569681-1-yu.c.chen@intel.com/
> And I'm working on the v3 patch which would include above workaround,
> will sent it
> out later.
>