[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <2fe2fb1c-345f-adca-d201-ed3ed6f418cc@linux.vnet.ibm.com>
Date: Wed, 8 Mar 2023 20:43:36 +0530
From: Shrikanth Hegde <sshegde@...ux.vnet.ibm.com>
To: Peter Zijlstra <peterz@...radead.org>
Cc: linux-kernel@...r.kernel.org, juri.lelli@...hat.com,
dietmar.eggemann@....com, rostedt@...dmis.org, bsegall@...gle.com,
mgorman@...e.de, bristot@...hat.com, corbet@....net,
qyousef@...alina.io, chris.hyser@...cle.com,
patrick.bellasi@...bug.net, pjt@...gle.com, pavel@....cz,
qperret@...gle.com, tim.c.chen@...ux.intel.com, joshdon@...gle.com,
timj@....org, kprateek.nayak@....com, yu.c.chen@...el.com,
youssefesmat@...omium.org, joel@...lfernandes.org,
mingo@...nel.org, vincent.guittot@...aro.org
Subject: Re: [PATCH 00/10] sched: EEVDF using latency-nice
> Hi!
>
> Ever since looking at the latency-nice patches, I've wondered if EEVDF would
> not make more sense, and I did point Vincent at some older patches I had for
> that (which is here his augmented rbtree thing comes from).
>
> Also, since I really dislike the dual tree, I also figured we could dynamically
> switch between an augmented tree and not (and while I have code for that,
> that's not included in this posting because with the current results I don't
> think we actually need this).
>
> Anyway, since I'm somewhat under the weather, I spend last week desperately
> trying to connect a small cluster of neurons in defiance of the snot overlord
> and bring back the EEVDF patches from the dark crypts where they'd been
> gathering cobwebs for the past 13 odd years.
>
> By friday they worked well enough, and this morning (because obviously I forgot
> the weekend is ideal to run benchmarks) I ran a bunch of hackbenck, netperf,
> tbench and sysbench -- there's a bunch of wins and losses, but nothing that
> indicates a total fail.
>
> ( in fact, some of the schbench results seem to indicate EEVDF schedules a lot
> more consistent than CFS and has a bunch of latency wins )
>
> ( hackbench also doesn't show the augmented tree and generally more expensive
> pick to be a loss, in fact it shows a slight win here )
>
>
> hackbech load + cyclictest --policy other results:
>
>
> EEVDF CFS
>
> # Min Latencies: 00053
> LNICE(19) # Avg Latencies: 04350
> # Max Latencies: 76019
>
> # Min Latencies: 00052 00053
> LNICE(0) # Avg Latencies: 00690 00687
> # Max Latencies: 14145 13913
>
> # Min Latencies: 00019
> LNICE(-19) # Avg Latencies: 00261
> # Max Latencies: 05642
>
>
> The nice -19 numbers aren't as pretty as Vincent's, but at the end I was going
> cross-eyed from staring at tree prints and I just couldn't figure out where it
> was going side-ways.
>
> There's definitely more benchmarking/tweaking to be done (0-day already
> reported a stress-ng loss), but if we can pull this off we can delete a whole
> much of icky heuristics code. EEVDF is a much better defined policy than what
> we currently have.
>
Tested the patch series on powerpc systems. This test is done in the same way
that was done for vincent's V12 series.
Creating two cgroups. In cgroup2 running stress-ng -l 50 --cpu=<total_cpu> and
in cgroup1 running micro benchmarks. Different latency values are assigned to
cgroup1.
Tested on two different system. One system has 480 CPU and other one has 96
CPU.
++++++++
Summary:
++++++++
For hackbench, 480 CPU system shows good improvement.
96 CPU system shows same numbers as 6.2. Smaller system was showing regressing
results as discussed in Vincent's V12 series. With this patch, there is no regression.
Schbench shows good improvement compared to v6.2 at LN=0 or LN=-20. Whereas
at LN=19, it shows regression.
Please suggest if any variation of the benchmark or a different benchmark to be run.
++++++++++++++++++
480 CPU system
++++++++++++++++++
==========
schbench
==========
v6.2 | v6.2+LN=0 | v6.2+LN=-20 | v6.2+LN=19
1 Threads
50.0th: 14.00 | 12.00 | 14.50 | 15.00
75.0th: 16.50 | 14.50 | 17.00 | 18.00
90.0th: 18.50 | 17.00 | 19.50 | 20.00
95.0th: 20.50 | 18.50 | 22.00 | 23.50
99.0th: 27.50 | 24.50 | 31.50 | 155.00
99.5th: 36.00 | 30.00 | 44.50 | 2991.00
99.9th: 81.50 | 171.50 | 153.00 | 4621.00
2 Threads
50.0th: 14.00 | 15.50 | 17.00 | 16.00
75.0th: 17.00 | 18.00 | 19.00 | 19.00
90.0th: 20.00 | 21.00 | 22.00 | 22.50
95.0th: 23.00 | 23.00 | 25.00 | 25.50
99.0th: 71.00 | 30.50 | 35.50 | 990.50
99.5th: 1170.00 | 53.00 | 71.00 | 3719.00
99.9th: 5088.00 | 245.50 | 138.00 | 6644.00
4 Threads
50.0th: 20.50 | 20.00 | 20.00 | 19.50
75.0th: 24.50 | 23.00 | 23.00 | 23.50
90.0th: 31.00 | 27.00 | 26.50 | 27.50
95.0th: 260.50 | 29.50 | 29.00 | 35.00
99.0th: 3644.00 | 106.00 | 37.50 | 2884.00
99.5th: 5152.00 | 227.00 | 92.00 | 5496.00
99.9th: 8076.00 | 3662.50 | 517.00 | 8640.00
8 Threads
50.0th: 26.00 | 23.50 | 22.50 | 25.00
75.0th: 32.50 | 29.50 | 27.50 | 31.00
90.0th: 41.50 | 34.50 | 31.50 | 39.00
95.0th: 794.00 | 37.00 | 34.50 | 579.50
99.0th: 5992.00 | 48.50 | 52.00 | 5872.00
99.5th: 7208.00 | 100.50 | 97.50 | 7280.00
99.9th: 9392.00 | 4098.00 | 1226.00 | 9328.00
16 Threads
50.0th: 37.50 | 33.00 | 34.00 | 37.00
75.0th: 49.50 | 43.50 | 44.00 | 49.00
90.0th: 70.00 | 52.00 | 53.00 | 66.00
95.0th: 1284.00 | 57.50 | 59.00 | 1162.50
99.0th: 5600.00 | 79.50 | 111.50 | 5912.00
99.5th: 7216.00 | 282.00 | 194.50 | 7392.00
99.9th: 9328.00 | 4026.00 | 2009.00 | 9440.00
32 Threads
50.0th: 59.00 | 56.00 | 57.00 | 59.00
75.0th: 83.00 | 77.50 | 79.00 | 83.00
90.0th: 118.50 | 94.00 | 95.00 | 120.50
95.0th: 1921.00 | 104.50 | 104.00 | 1800.00
99.0th: 6672.00 | 425.00 | 255.00 | 6384.00
99.5th: 8252.00 | 2800.00 | 1252.00 | 7696.00
99.9th: 10448.00 | 7264.00 | 5888.00 | 9504.00
=========
hackbench
=========
Process 10 0.19 | 0.18 | 0.17 | 0.18
Process 20 0.34 | 0.32 | 0.33 | 0.31
Process 30 0.45 | 0.42 | 0.43 | 0.43
Process 40 0.58 | 0.53 | 0.53 | 0.53
Process 50 0.70 | 0.64 | 0.64 | 0.65
Process 60 0.82 | 0.74 | 0.75 | 0.76
thread 10 0.20 | 0.19 | 0.19 | 0.19
thread 20 0.36 | 0.34 | 0.34 | 0.34
Process(Pipe) 10 0.24 | 0.15 | 0.15 | 0.15
Process(Pipe) 20 0.46 | 0.22 | 0.22 | 0.21
Process(Pipe) 30 0.65 | 0.30 | 0.29 | 0.29
Process(Pipe) 40 0.90 | 0.35 | 0.36 | 0.34
Process(Pipe) 50 1.04 | 0.38 | 0.39 | 0.38
Process(Pipe) 60 1.16 | 0.42 | 0.42 | 0.43
thread(Pipe) 10 0.19 | 0.13 | 0.13 | 0.13
thread(Pipe) 20 0.46 | 0.21 | 0.21 | 0.21
++++++++++++++++++
96 CPU system
++++++++++++++++++
===========
schbench
===========
v6.2 | v6.2+LN=0 | v6.2+LN=-20 | v6.2+LN=19
1 Thread
50.0th: 10.50 | 10.00 | 10.00 | 11.00
75.0th: 12.50 | 11.50 | 11.50 | 12.50
90.0th: 15.00 | 13.00 | 13.50 | 16.50
95.0th: 47.50 | 15.00 | 15.00 | 274.50
99.0th: 4744.00 | 17.50 | 18.00 | 5032.00
99.5th: 7640.00 | 18.50 | 525.00 | 6636.00
99.9th: 8916.00 | 538.00 | 6704.00 | 9264.00
2 Threads
50.0th: 11.00 | 10.00 | 11.00 | 11.00
75.0th: 13.50 | 12.00 | 12.50 | 13.50
90.0th: 17.00 | 14.00 | 14.00 | 17.00
95.0th: 451.50 | 16.00 | 15.50 | 839.00
99.0th: 5488.00 | 20.50 | 18.00 | 6312.00
99.5th: 6712.00 | 986.00 | 19.00 | 7664.00
99.9th: 9856.00 | 4913.00 | 1154.00 | 8736.00
4 Threads
50.0th: 13.00 | 12.00 | 12.00 | 13.00
75.0th: 15.00 | 14.00 | 14.00 | 15.00
90.0th: 23.50 | 16.00 | 16.00 | 20.00
95.0th: 2508.00 | 17.50 | 17.50 | 1818.00
99.0th: 7232.00 | 777.00 | 38.50 | 5952.00
99.5th: 8720.00 | 3548.00 | 1926.00 | 7788.00
99.9th: 10352.00 | 6320.00 | 7160.00 | 10000.00
8 Threads
50.0th: 16.00 | 15.00 | 15.00 | 16.00
75.0th: 20.00 | 18.00 | 18.00 | 19.50
90.0th: 371.50 | 20.00 | 21.00 | 245.50
95.0th: 2992.00 | 22.00 | 23.00 | 2608.00
99.0th: 7784.00 | 1084.50 | 563.50 | 7136.00
99.5th: 9488.00 | 2612.00 | 2696.00 | 8720.00
99.9th: 15568.00 | 6656.00 | 7496.00 | 10000.00
16 Threads
50.0th: 23.00 | 21.00 | 20.00 | 22.50
75.0th: 31.00 | 27.50 | 26.00 | 29.50
90.0th: 1981.00 | 32.50 | 30.50 | 1500.50
95.0th: 4856.00 | 304.50 | 34.00 | 4046.00
99.0th: 10112.00 | 5720.00 | 4590.00 | 8220.00
99.5th: 13104.00 | 7828.00 | 7008.00 | 9312.00
99.9th: 18624.00 | 9856.00 | 9504.00 | 11984.00
32 Threads
50.0th: 36.50 | 34.50 | 33.50 | 35.50
75.0th: 56.50 | 48.00 | 46.00 | 52.50
90.0th: 4728.00 | 1470.50 | 376.00 | 3624.00
95.0th: 7808.00 | 4130.00 | 3850.00 | 6488.00
99.0th: 15776.00 | 8972.00 | 9060.00 | 9872.00
99.5th: 19072.00 | 11328.00 | 12224.00 | 11520.00
99.9th: 28864.00 | 18016.00 | 18368.00 | 18848.00
==========
Hackbench
=========
Type groups v6.2 | v6.2+LN=0 | v6.2+LN=-20 | v6.2+LN=19
Process 10 0.33 | 0.33 | 0.33 | 0.33
Process 20 0.61 | 0.56 | 0.58 | 0.57
Process 30 0.87 | 0.82 | 0.81 | 0.81
Process 40 1.10 | 1.05 | 1.06 | 1.05
Process 50 1.34 | 1.28 | 1.29 | 1.29
Process 60 1.58 | 1.53 | 1.52 | 1.51
thread 10 0.36 | 0.35 | 0.35 | 0.35
thread 20 0.64 | 0.63 | 0.62 | 0.62
Process(Pipe) 10 0.18 | 0.18 | 0.18 | 0.17
Process(Pipe) 20 0.32 | 0.31 | 0.31 | 0.31
Process(Pipe) 30 0.42 | 0.41 | 0.41 | 0.42
Process(Pipe) 40 0.56 | 0.53 | 0.55 | 0.53
Process(Pipe) 50 0.68 | 0.66 | 0.66 | 0.66
Process(Pipe) 60 0.80 | 0.78 | 0.78 | 0.78
thread(Pipe) 10 0.20 | 0.18 | 0.19 | 0.18
thread(Pipe) 20 0.34 | 0.34 | 0.33 | 0.33
Tested-by: Shrikanth Hegde <sshegde@...ux.vnet.ibm.com>
Powered by blists - more mailing lists