linux-kernel - Re: [PATCH 1/1] sched: Consider CPU contention in frequency & load-balance busiest CPU selection

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <ZFSr4Adtx1ZI8hoc@chenyu5-mobl1>
Date:   Fri, 5 May 2023 15:10:24 +0800
From:   Chen Yu <yu.c.chen@...el.com>
To:     Dietmar Eggemann <dietmar.eggemann@....com>
CC:     Ingo Molnar <mingo@...nel.org>,
        Peter Zijlstra <peterz@...radead.org>,
        Vincent Guittot <vincent.guittot@...aro.org>,
        Qais Yousef <qyousef@...alina.io>,
        Kajetan Puchalski <kajetan.puchalski@....com>,
        "Morten Rasmussen" <morten.rasmussen@....com>,
        Vincent Donnefort <vdonnefort@...gle.com>,
        Quentin Perret <qperret@...gle.com>,
        "Abhijeet Dharmapurikar" <adharmap@...cinc.com>,
        <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH 1/1] sched: Consider CPU contention in frequency &
 load-balance busiest CPU selection

On 2023-04-06 at 17:50:30 +0200, Dietmar Eggemann wrote:
> Use new cpu_boosted_util_cfs() instead of cpu_util_cfs().
> 
> The former returns max(util_avg, runnable_avg) capped by max CPU
> capacity. CPU contention is thereby considered through runnable_avg.
> 
> The change in load-balance only affects migration type `migrate_util`.
> 
> Suggested-by: Vincent Guittot <vincent.guittot@...aro.org>
> Signed-off-by: Dietmar Eggemann <dietmar.eggemann@....com>
>
Tested on Intel Sapphire Rapids which has 2x56C/112T = 224 CPUs.
The test tries to check if this is any impact on find_busiest_queue()
so it was tested with cpufreq governor performance.
The baseline is the 6.3 sched/core branch on top of
Commit 67fff302fc445a ("sched/fair: Introduce SIS_CURRENT to wake up"),
and compared to the code with current patch applied.

In summary no obvious difference and some small improvements on tbench
were observed so far:

schbench(latency)
========
case                    load            baseline(std%)  compare%( std%)
normal                  1-mthreads       1.00 (  0.00)   +1.75 (  1.26)
normal                  2-mthreads       1.00 (  5.84)   -5.41 (  2.09)
normal                  4-mthreads       1.00 (  2.59)   -3.67 (  1.25)
normal                  8-mthreads       1.00 (  2.46)   +3.48 (  0.00)

hackbench(throughput)
=========
case                    load            baseline(std%)  compare%( std%)
process-pipe            1-groups         1.00 (  0.26)   +0.73 (  2.18)
process-pipe            2-groups         1.00 (  3.91)   +1.96 (  6.17)
process-pipe            4-groups         1.00 (  3.59)   -2.56 (  5.18)
process-sockets         1-groups         1.00 (  0.97)   +1.83 (  0.80)
process-sockets         2-groups         1.00 (  6.09)   +3.83 (  8.19)
process-sockets         4-groups         1.00 (  0.87)   -5.94 (  1.86)
threads-pipe            1-groups         1.00 (  0.44)   +0.23 (  0.17)
threads-pipe            2-groups         1.00 (  1.18)   +1.41 (  1.16)
threads-pipe            4-groups         1.00 (  2.40)   +1.34 (  1.86)
threads-sockets         1-groups         1.00 (  1.97)   -2.27 (  1.44)
threads-sockets         2-groups         1.00 (  3.85)   -2.44 (  2.42)
threads-sockets         4-groups         1.00 (  1.18)   -2.93 (  1.09)

netperf(throughput)
=======
case                    load            baseline(std%)  compare%( std%)
TCP_RR                  56-threads       1.00 (  4.35)   +2.50 (  4.73)
TCP_RR                  112-threads      1.00 (  4.05)   +2.12 (  4.05)
TCP_RR                  168-threads      1.00 (  5.10)   +0.10 (  3.70)
TCP_RR                  224-threads      1.00 (  3.37)   +0.52 (  2.79)
TCP_RR                  280-threads      1.00 ( 10.04)   -0.36 ( 10.14)
TCP_RR                  336-threads      1.00 ( 17.45)   +0.07 ( 19.04)
TCP_RR                  392-threads      1.00 ( 27.89)   -0.00 ( 30.48)
TCP_RR                  448-threads      1.00 ( 38.99)   +0.29 ( 33.93)
UDP_RR                  56-threads       1.00 (  7.98)   -6.91 ( 13.97)
UDP_RR                  112-threads      1.00 ( 18.06)   +5.83 ( 27.46)
UDP_RR                  168-threads      1.00 ( 17.45)   -3.00 ( 29.40)
UDP_RR                  224-threads      1.00 ( 21.15)   -3.99 ( 28.64)
UDP_RR                  280-threads      1.00 ( 19.74)   -3.20 ( 29.57)
UDP_RR                  336-threads      1.00 ( 22.26)   -4.24 ( 32.35)
UDP_RR                  392-threads      1.00 ( 35.88)   -5.53 ( 35.76)
UDP_RR                  448-threads      1.00 ( 40.38)   -2.65 ( 48.57)

tbench(throughput)
======
case                    load            baseline(std%)  compare%( std%)
loopback                56-threads       1.00 (  0.74)   +2.54 (  0.84)
loopback                112-threads      1.00 (  0.37)   -2.26 (  1.01)
loopback                168-threads      1.00 (  0.49)   +1.44 (  3.05)
loopback                224-threads      1.00 (  0.20)   +6.05 (  0.54)
loopback                280-threads      1.00 (  0.44)   +5.35 (  0.05)
loopback                336-threads      1.00 (  0.02)   +5.03 (  0.06)
loopback                392-threads      1.00 (  0.07)   +5.03 (  0.04)
loopback                448-threads      1.00 (  0.06)   +4.86 (  0.22)

thanks,
Chenyu