lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Thu, 13 Apr 2023 23:42:34 +0800
From:   Chen Yu <yu.c.chen@...el.com>
To:     Peter Zijlstra <peterz@...radead.org>
CC:     <mingo@...nel.org>, <vincent.guittot@...aro.org>,
        <linux-kernel@...r.kernel.org>, <juri.lelli@...hat.com>,
        <dietmar.eggemann@....com>, <rostedt@...dmis.org>,
        <bsegall@...gle.com>, <mgorman@...e.de>, <bristot@...hat.com>,
        <corbet@....net>, <qyousef@...alina.io>, <chris.hyser@...cle.com>,
        <patrick.bellasi@...bug.net>, <pjt@...gle.com>, <pavel@....cz>,
        <qperret@...gle.com>, <tim.c.chen@...ux.intel.com>,
        <joshdon@...gle.com>, <timj@....org>, <kprateek.nayak@....com>,
        <youssefesmat@...omium.org>, <joel@...lfernandes.org>,
        <efault@....de>
Subject: Re: [PATCH 06/17] sched/fair: Add lag based placement

On 2023-04-05 at 11:47:20 +0200, Peter Zijlstra wrote:
> On Mon, Apr 03, 2023 at 05:18:06PM +0800, Chen Yu wrote:
> > On 2023-03-28 at 11:26:28 +0200, Peter Zijlstra wrote:
So I launched the test on another platform with more CPUs,

baseline: 6.3-rc6

compare:  sched/eevdf branch on top of commit 8c59a975d5ee ("sched/eevdf: Debug / validation crud")


--------------------------------------------------------------------------------------
schbench:mthreads = 2
                   baseline                    eevdf+NO_PLACE_BONUS
worker_threads
25%                80.00           +19.2%      95.40            schbench.latency_90%_us
                   (0.00%)                     (0.51%)          stddev
50%                183.70          +2.2%       187.80           schbench.latency_90%_us
                   (0.35%)                     (0.46%)          stddev
75%                4065            -21.4%      3193             schbench.latency_90%_us
                   (69.65%)                    (3.42%)          stddev
100%               13696           -92.4%      1040             schbench.latency_90%_us
                   (5.25%)                     (69.03%)         stddev
125%               16457           -78.6%      3514             schbench.latency_90%_us
                   (10.50%)                    (6.25%)          stddev
150%               31177           -77.5%      7008             schbench.latency_90%_us
                   (6.84%)                     (5.19%)          stddev
175%               40729           -75.1%      10160            schbench.latency_90%_us
                   (6.11%)                     (2.53%)          stddev
200%               52224           -74.4%      13385            schbench.latency_90%_us
                   (10.42%)                    (1.72%)          stddev


                  eevdf+NO_PLACE_BONUS       eevdf+PLACE_BONUS
worker_threads
25%               96.30             +0.2%      96.50            schbench.latency_90%_us
                  (0.66%)                      (0.52%)          stddev
50%               187.20            -3.0%      181.60           schbench.latency_90%_us
                  (0.21%)                      (0.71%)          stddev
75%                3034             -84.1%     482.50           schbench.latency_90%_us
                  (5.56%)                      (27.40%)         stddev
100%              648.20            +114.7%    1391             schbench.latency_90%_us
                  (64.70%)                     (10.05%)         stddev
125%              3506              -3.0%      3400             schbench.latency_90%_us
                  (2.79%)                      (9.89%)          stddev
150%              6793              +29.6%     8803             schbench.latency_90%_us
                  (1.39%)                      (7.30%)          stddev
175%               9961             +9.2%      10876            schbench.latency_90%_us
                  (1.51%)                      (6.54%)          stddev
200%              13660             +3.3%      14118            schbench.latency_90%_us
                  (1.38%)                      (6.02%)          stddev



Summary for schbench: in most cases eevdf+NO_PLACE_BONUS gives the best performance.
And this is aligned with the previous test on another platform with smaller number of
CPUs, eevdf benefits schbench overall.

---------------------------------------------------------------------------------------



hackbench: ipc=pipe mode=process default fd:20

                   baseline                     eevdf+NO_PLACE_BONUS
worker_threads
1                  103103            -0.3%     102794        hackbench.throughput_avg
25%                115562          +825.7%    1069725        hackbench.throughput_avg
50%                296514          +352.1%    1340414        hackbench.throughput_avg
75%                498059          +190.8%    1448156        hackbench.throughput_avg
100%               804560           +74.8%    1406413        hackbench.throughput_avg


                   eevdf+NO_PLACE_BONUS        eevdf+PLACE_BONUS
worker_threads
1                  102172            +1.5%     103661         hackbench.throughput_avg
25%                1076503           -52.8%     508612        hackbench.throughput_avg
50%                1394311           -68.2%     443251        hackbench.throughput_avg
75%                1476502           -70.2%     440391        hackbench.throughput_avg
100%               1512706           -76.2%     359741        hackbench.throughput_avg


Summary for hackbench pipe process test: in most cases eevdf+NO_PLACE_BONUS gives the best performance.

-------------------------------------------------------------------------------------
unixbench: test=pipe

                   baseline                     eevdf+NO_PLACE_BONUS
nr_task
1                  1405              -0.5%       1398        unixbench.score
25%                77942             +0.9%      78680        unixbench.score
50%                155384            +1.1%     157100        unixbench.score
75%                179756            +0.3%     180295        unixbench.score
100%               204030            -0.2%     203540        unixbench.score
125%               204972            -0.4%     204062        unixbench.score
150%               205891            -0.5%     204792        unixbench.score
175%               207051            -0.5%     206047        unixbench.score
200%               209387            -0.9%     207559        unixbench.score


                   eevdf+NO_PLACE_BONUS        eevdf+PLACE_BONUS
nr_task
1                  1405              -0.3%       1401        unixbench.score
25%                78640             +0.0%      78647        unixbench.score
50%                157153            -0.0%     157093        unixbench.score
75%                180152            +0.0%     180205        unixbench.score
100%               203479            -0.0%     203464        unixbench.score
125%               203866            +0.1%     204013        unixbench.score
150%               204872            -0.0%     204838        unixbench.score
175%               205799            +0.0%     205824        unixbench.score
200%               207152            +0.2%     207546        unixbench.score

Seems to have no impact on unixbench in pipe mode.
--------------------------------------------------------------------------------

netperf: TCP_RR, ipv4, loopback

                   baseline                    eevdf+NO_PLACE_BONUS
nr_threads
25%                56232            -1.7%      55265        netperf.Throughput_tps
50%                49876            -3.1%      48338        netperf.Throughput_tps
75%                24281            +1.9%      24741        netperf.Throughput_tps
100%               73598            +3.8%      76375        netperf.Throughput_tps
125%               59119            +1.4%      59968        netperf.Throughput_tps
150%               49124            +1.2%      49727        netperf.Throughput_tps
175%               41929            +0.2%      42004        netperf.Throughput_tps
200%               36543            +0.4%      36677        netperf.Throughput_tps

                   eevdf+NO_PLACE_BONUS        eevdf+PLACE_BONUS
nr_threads
25%                55296            +4.7%      57877        netperf.Throughput_tps
50%                48659            +1.9%      49585        netperf.Throughput_tps
75%                24741            +0.3%      24807        netperf.Throughput_tps
100%               76455            +6.7%      81548        netperf.Throughput_tps
125%               60082            +7.6%      64622        netperf.Throughput_tps
150%               49618            +7.7%      53429        netperf.Throughput_tps
175%               41974            +7.6%      45160        netperf.Throughput_tps
200%               36677            +6.5%      39067        netperf.Throughput_tps

Seems to have no impact on netperf.
-----------------------------------------------------------------------------------

stress-ng: futex

                   baseline                     eevdf+NO_PLACE_BONUS
nr_threads
25%                207926           -21.0%     164356       stress-ng.futex.ops_per_sec
50%                46611           -16.1%      39130        stress-ng.futex.ops_per_sec
75%                71381           -11.3%      63283        stress-ng.futex.ops_per_sec
100%               58766            -0.8%      58269        stress-ng.futex.ops_per_sec
125%               59859           +11.3%      66645        stress-ng.futex.ops_per_sec
150%               52869            +7.6%      56863        stress-ng.futex.ops_per_sec
175%               49607           +22.9%      60969        stress-ng.futex.ops_per_sec
200%               56011           +11.8%      62631        stress-ng.futex.ops_per_sec


When the system is not busy, there is regression. When the system gets busier,
there are some improvement. Even with PLACE_BONUS enabled, there are still regression.
Per the perf profile of 50% case, there are nearly the same ratio of wakeup with vs without
eevdf patch applied:
50.82            -0.7       50.15        perf-profile.children.cycles-pp.futex_wake
but there are more preemption after eevdf enabled:
135095           +15.4%     155943        stress-ng.time.involuntary_context_switches
which is near the performance loss -16.1%
That is to say, eevdf help futex wakee grab the CPU easier(benefit latency), while might
have some impact on throughput?

thanks,
Chenyu

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ