[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1541877001.17878.5.camel@suse.cz>
Date: Sat, 10 Nov 2018 20:10:01 +0100
From: Giovanni Gherdovich <ggherdovich@...e.cz>
To: "Rafael J. Wysocki" <rjw@...ysocki.net>,
Linux PM <linux-pm@...r.kernel.org>,
Doug Smythies <dsmythies@...us.net>
Cc: Srinivas Pandruvada <srinivas.pandruvada@...ux.intel.com>,
Peter Zijlstra <peterz@...radead.org>,
LKML <linux-kernel@...r.kernel.org>,
Frederic Weisbecker <frederic@...nel.org>,
Mel Gorman <mgorman@...e.de>,
Daniel Lezcano <daniel.lezcano@...aro.org>
Subject: Re: [RFC/RFT][PATCH v5] cpuidle: New timer events oriented governor
for tickless systems
On Thu, 2018-11-08 at 18:25 +0100, Rafael J. Wysocki wrote:
> From: Rafael J. Wysocki <rafael.j.wysocki@...el.com>
> Subject: [PATCH] cpuidle: New timer events oriented governor for tickless systems
>
> The venerable menu governor does some thigns that are quite
> questionable in my view.
>
> First, it includes timer wakeups in the pattern detection data and
> mixes them up with wakeups from other sources which in some cases
> causes it to expect what essentially would be a timer wakeup in a
> time frame in which no timer wakeups are possible (becuase it knows
> the time until the next timer event and that is later than the
> expected wakeup time).
>
> Second, it uses the extra exit latency limit based on the predicted
> idle duration and depending on the number of tasks waiting on I/O,
> even though those tasks may run on a different CPU when they are
> woken up. Moreover, the time ranges used by it for the sleep length
> correction factors depend on whether or not there are tasks waiting
> on I/O, which again doesn't imply anything in particular, and they
> are not correlated to the list of available idle states in any way
> whatever.
>
> Also, the pattern detection code in menu may end up considering
> values that are too large to matter at all, in which cases running
> it is a waste of time.
>
> A major rework of the menu governor would be required to address
> these issues and the performance of at least some workloads (tuned
> specifically to the current behavior of the menu governor) is likely
> to suffer from that. It is thus better to introduce an entirely new
> governor without them and let everybody use the governor that works
> better with their actual workloads.
>
> The new governor introduced here, the timer events oriented (TEO)
> governor, uses the same basic strategy as menu: it always tries to
> find the deepest idle state that can be used in the given conditions.
> However, it applies a different approach to that problem.
>
> First, it doesn't use "correction factors" for the time till the
> closest timer, but instead it tries to correlate the measured idle
> duration values with the available idle states and use that
> information to pick up the idle state that is most likely to "match"
> the upcoming CPU idle interval.
>
> Second, it doesn't take the number of "I/O waiters" into account at
> all and the pattern detection code in it avoids taking timer wakeups
> into account. It also only uses idle duration values less than the
> current time till the closest timer (with the tick excluded) for that
> purpose.
>
> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@...el.com>
> ---
>
> v4 -> v5:
> * Avoid using shallow idle states when the tick has been stopped already.
>
> v3 -> v4:
> * Make the pattern detection avoid returning too early if the minimum
> sample is too far from the average.
> * Reformat the changelog (as requested by Peter).
>
> v2 -> v3:
> * Simplify the pattern detection code and make it return a value
> lower than the time to the closest timer if the majority of recent
> idle intervals are below it regardless of their variance (that should
> cause it to be slightly more aggressive).
> * Do not count wakeups from state 0 due to the time limit in poll_idle()
> as non-timer.
>
> [...]
[NOTE: the tables in this message are quite wide, ~130 columns. If this
doesn't get to you properly formatted you can read a copy of this message at
the URL https://beta.suse.com/private/ggherdovich/teo-eval/teo-eval.html ]
Hello Rafael,
I have results for v3 and v5. Regarding v4, I made a mistake and didn't get
valid data; as I saw v5 coming shortly after, I didn't rerun v4.
I'm replying to the v5 thread because that's where these results belong, but
I'm quoting your text from the v2 email at
https://lore.kernel.org/lkml/4168371.zz0pVZtGOY@aspire.rjw.lan so that's
easier to follow along.
The quick summary is:
---> sockperf on loopback over UDP, mode "throughput":
this had a 12% regression in v2 on 48x-HASWELL-NUMA, which is completely
recovered in v3 and v5. Good stuff.
---> dbench on xfs:
this was down 16% in v2 on 48x-HASWELL-NUMA. On v5 we're at a 10%
regression. Slight improvement. What's really hurting here is the single
client scenario.
---> netperf-udp on loopback:
had 6% regression on v2 on 8x-SKYLAKE-UMA, which is the same as what
happens in v5.
---> tbench on loopback:
was down 10% in v2 on 8x-SKYLAKE-UMA, now slightly worse in v5 with a 12%
regression. As in dbench, it's at low number of clients that the results
are worst. Note that this machine is different from the one that has the
dbench regression.
A more detailed report follows below.
I maintain my original opinion from v2 that this governor is largely
performance-neutral and I'm not overly worried about the numbers above:
* results change from machine to machine: dbench is down 10% on
48x-HASWELL-NUMA, but it also gives you the largest win on the board with a
4% improvement on 8x-SKYLAKE-UMA. All regressions I mention only manifest on
one out of three machines.
* similar benchmarks give contradicting results: dbench seems highly sensitive
to this patch, but pgbench, sqlite, and fio are not. netperf-udp is slightly
down on 48x-HASWELL-NUMA but sockperf-udp-throughput has benefited from v5
on that same machine.
To raise an alert from the performance angle I have to see red on my board
from an entire category of benchmarks (ie I/O, or networking, or
scheduler-intensive, etc) and on a variety of hardware configurations. That's
not happening here.
On Sun, 2018-11-04 at 11:06 +0100, Rafael J. Wysocki wrote:
> On Wednesday, October 31, 2018 7:36:21 PM CET Giovanni Gherdovich wrote:
> >
> > [...]
> > I've tested your patches applying them on v4.18 (plus the backport
> > necessary for v2 as Doug helpfully noted), just because it was the latest
> > release when I started preparing this.
I did the same for v3 and v5: baseline is v4.18, using that backport from
linux-next.
> >
> > I've tested it on three machines, with different generations of Intel CPUs:
> >
> > * single socket E3-1240 v5 (Skylake 8 cores, which I'll call 8x-SKYLAKE-UMA)
> > * two sockets E5-2698 v4 (Broadwell 80 cores, 80x-BROADWELL-NUMA from here onwards)
> > * two sockets E5-2670 v3 (Haswell 48 cores, 48x-HASWELL-NUMA from here onwards)
> >
Same machines.
> >
> > BENCHMARKS WITH NEUTRAL RESULTS
> > ===============================
> >
> > These are the workloads where no noticeable difference is measured (on both
> > v1 and v2, all machines), together with the corresponding MMTests[1]
> > configuration file name:
> >
> > * pgbench read-only on xfs, pgbench read/write on xfs
> > * global-dhp__db-pgbench-timed-ro-small-xfs
> > * global-dhp__db-pgbench-timed-rw-small-xfs
> > * siege
> > * global-dhp__http-siege
> > * hackbench, pipetest
> > * global-dhp__scheduler-unbound
> > * Linux kernel compilation
> > * global-dhp__workload_kerndevel-xfs
> > * NASA Parallel Benchmarks, C-Class (linear algebra; run both with OpenMP
> > and OpenMPI, over xfs)
> > * global-dhp__nas-c-class-mpi-full-xfs
> > * global-dhp__nas-c-class-omp-full
> > * FIO (Flexible IO) in several configurations
> > * global-dhp__io-fio-randread-async-randwrite-xfs
> > * global-dhp__io-fio-randread-async-seqwrite-xfs
> > * global-dhp__io-fio-seqread-doublemem-32k-4t-xfs
> > * global-dhp__io-fio-seqread-doublemem-4k-4t-xfs
> > * netperf on loopback over TCP
> > * global-dhp__network-netperf-unbound
>
> The above is great to know.
All of the above are confirmed, plus we can add to the group of neutral
benchmarks:
* xfsrepair
* global-dhp__io-xfsrepair-xfs
* sqlite (insert operations on xfs)
* global-dhp__db-sqlite-insert-medium-xfs
* schbench
* global-dhp__workload_schbench
* gitsource on xfs (git unit tests, shell intensive)
* global-dhp__workload_shellscripts-xfs
>
> > BENCHMARKS WITH NON-NEUTRAL RESULTS: OVERVIEW
> > =============================================
> >
> > These are benchmarks which exhibit a variation in their performance;
> > you'll see the magnitude of the changes is moderate and it's highly variable
> > from machine to machine. All percentages refer to the v4.18 baseline. In
> > more than one case the Haswell machine seems to prefer v1 to v2.
> >
> > [...]
> >
> > * netperf on loopback over UDP
> > * global-dhp__network-netperf-unbound
> >
> > teo-v1 teo-v2
> > -------------------------------------------------
> > 8x-SKYLAKE-UMA no change 6% worse
> > 80x-BROADWELL-NUMA 1% worse 4% worse
> > 48x-HASWELL-NUMA 3% better 5% worse
> >
New data for netperf-udp, as 8x-SKYLAKE-UMA looked slightly off:
* netperf on loopback over UDP
* global-dhp__network-netperf-unbound
teo-v1 teo-v2 teo-v3 teo-v5
------------------------------------------------------------------------------
8x-SKYLAKE-UMA no change 6% worse 4% worse 6% worse
80x-BROADWELL-NUMA 1% worse 4% worse no change no change
48x-HASWELL-NUMA 3% better 5% worse 7% worse 5% worse
> > [...]
> >
> > * sockperf on loopback over UDP, mode "throughput"
> > * global-dhp__network-sockperf-unbound
>
> Generally speaking, I'm not worried about single-digit percent differences,
> because overall they tend to fall into the noise range in the grand picture.
>
> > teo-v1 teo-v2
> > -------------------------------------------------
> > 8x-SKYLAKE-UMA 1% worse 1% worse
> > 80x-BROADWELL-NUMA 3% better 2% better
> > 48x-HASWELL-NUMA 4% better 12% worse
>
> But the 12% difference here is slightly worrisome.
Following up on the above:
* sockperf on loopback over UDP, mode "throughput"
* global-dhp__network-sockperf-unbound
teo-v1 teo-v2 teo-v3 teo-v5
-------------------------------------------------------------------------------
8x-SKYLAKE-UMA 1% worse 1% worse 1% worse 1% worse
80x-BROADWELL-NUMA 3% better 2% better 5% better 3% worse
48x-HASWELL-NUMA 4% better 12% worse no change no change
> >
> > [...]
> >
> > * dbench on xfs
> > * global-dhp__io-dbench4-async-xfs
> >
> > teo-v1 teo-v2
> > -------------------------------------------------
> > 8x-SKYLAKE-UMA 3% better 4% better
> > 80x-BROADWELL-NUMA no change no change
> > 48x-HASWELL-NUMA 6% worse 16% worse
>
> And same here.
With new data:
* dbench on xfs
* global-dhp__io-dbench4-async-xfs
teo-v1 teo-v2 teo-v3 teo-v5
-------------------------------------------------------------------------------
8x-SKYLAKE-UMA 3% better 4% better 6% better 4% better
80x-BROADWELL-NUMA no change no change 1% worse 3% worse
48x-HASWELL-NUMA 6% worse 16% worse 8% worse 10% worse
>
> > * tbench on loopback
> > * global-dhp__network-tbench
> >
> > teo-v1 teo-v2
> > -------------------------------------------------
> > 8x-SKYLAKE-UMA 1% worse 10% worse
> > 80x-BROADWELL-NUMA 1% worse 1% worse
> > 48x-HASWELL-NUMA 1% worse 2% worse
> >
Update on tbench:
* tbench on loopback
* global-dhp__network-tbench
teo-v1 teo-v2 teo-v3 teo-v5
------------------------------------------------------------------------------
8x-SKYLAKE-UMA 1% worse 10% worse 11% worse 12% worse
80x-BROADWELL-NUMA 1% worse 1% worse no cahnge 1% worse
48x-HASWELL-NUMA 1% worse 2% worse 1% worse 1% worse
> > [...]
> >
> > BENCHMARKS WITH NON-NEUTRAL RESULTS: DETAIL
> > ===========================================
> >
> > Now some more detail. Each benchmark is run in a variety of configurations
> > (eg. number of threads, number of concurrent connections and so forth) each
> > of them giving a result. What you see above is the geometric mean of
> > "sub-results"; below is the detailed view where there was a regression
> > larger than 5% (either in v1 or v2, on any of the machines). That means
> > I'll exclude xfsrepar, sqlite, schbench and the git unit tests "gitsource"
> > that have negligible swings from the baseline.
> >
> > In all tables asterisks indicate a statement about statistical
> > significance: the difference with baseline has a p-value smaller than 0.1
> > (small p-values indicate that the difference is real and not just random
> > noise).
> >
> > NETPERF-UDP
> > ===========
> > NOTES: Test run in mode "stream" over UDP. The varying parameter is the
> > message size in bytes. Each measurement is taken 5 times and the
> > harmonic mean is reported.
> > MEASURES: Throughput in MBits/second, both on the sender and on the receiver end.
> > HIGHER is better
> >
> > machine: 8x-SKYLAKE-UMA
> > 4.18.0 4.18.0 4.18.0
> > vanilla teo-v1 teo-v2+backport
> > -----------------------------------------------------------------------------------------
> > Hmean send-64 362.27 ( 0.00%) 362.87 ( 0.16%) 318.85 * -11.99%*
> > Hmean send-128 723.17 ( 0.00%) 723.66 ( 0.07%) 660.96 * -8.60%*
> > Hmean send-256 1435.24 ( 0.00%) 1427.08 ( -0.57%) 1346.22 * -6.20%*
> > Hmean send-1024 5563.78 ( 0.00%) 5529.90 * -0.61%* 5228.28 * -6.03%*
> > Hmean send-2048 10935.42 ( 0.00%) 10809.66 * -1.15%* 10521.14 * -3.79%*
> > Hmean send-3312 16898.66 ( 0.00%) 16539.89 * -2.12%* 16240.87 * -3.89%*
> > Hmean send-4096 19354.33 ( 0.00%) 19185.43 ( -0.87%) 18600.52 * -3.89%*
> > Hmean send-8192 32238.80 ( 0.00%) 32275.57 ( 0.11%) 29850.62 * -7.41%*
> > Hmean send-16384 48146.75 ( 0.00%) 49297.23 * 2.39%* 48295.51 ( 0.31%)
> > Hmean recv-64 362.16 ( 0.00%) 362.87 ( 0.19%) 318.82 * -11.97%*
> > Hmean recv-128 723.01 ( 0.00%) 723.66 ( 0.09%) 660.89 * -8.59%*
> > Hmean recv-256 1435.06 ( 0.00%) 1426.94 ( -0.57%) 1346.07 * -6.20%*
> > Hmean recv-1024 5562.68 ( 0.00%) 5529.90 * -0.59%* 5228.28 * -6.01%*
> > Hmean recv-2048 10934.36 ( 0.00%) 10809.66 * -1.14%* 10519.89 * -3.79%*
> > Hmean recv-3312 16898.65 ( 0.00%) 16538.21 * -2.13%* 16240.86 * -3.89%*
> > Hmean recv-4096 19351.99 ( 0.00%) 19183.17 ( -0.87%) 18598.33 * -3.89%*
> > Hmean recv-8192 32238.74 ( 0.00%) 32275.13 ( 0.11%) 29850.39 * -7.41%*
> > Hmean recv-16384 48146.59 ( 0.00%) 49296.23 * 2.39%* 48295.03 ( 0.31%)
>
> That is a bit worse than I would like it to be TBH.
update on netperf-udp:
machine: 8x-SKYLAKE-UMA
4.18.0 4.18.0 4.18.0 4.18.0 4.18.0
vanilla teo teo-v2+backport teo-v3+backport teo-v5+backport
---------------------------------------------------------------------------------------------------------------------------------------
Hmean send-64 362.27 ( 0.00%) 362.87 ( 0.16%) 318.85 * -11.99%* 347.08 * -4.19%* 333.48 * -7.95%*
Hmean send-128 723.17 ( 0.00%) 723.66 ( 0.07%) 660.96 * -8.60%* 676.46 * -6.46%* 650.71 * -10.02%*
Hmean send-256 1435.24 ( 0.00%) 1427.08 ( -0.57%) 1346.22 * -6.20%* 1359.59 * -5.27%* 1323.83 * -7.76%*
Hmean send-1024 5563.78 ( 0.00%) 5529.90 * -0.61%* 5228.28 * -6.03%* 5382.04 * -3.27%* 5271.99 * -5.24%*
Hmean send-2048 10935.42 ( 0.00%) 10809.66 * -1.15%* 10521.14 * -3.79%* 10610.29 * -2.97%* 10544.58 * -3.57%*
Hmean send-3312 16898.66 ( 0.00%) 16539.89 * -2.12%* 16240.87 * -3.89%* 16271.23 * -3.71%* 15968.89 * -5.50%*
Hmean send-4096 19354.33 ( 0.00%) 19185.43 ( -0.87%) 18600.52 * -3.89%* 18692.16 * -3.42%* 18408.69 * -4.89%*
Hmean send-8192 32238.80 ( 0.00%) 32275.57 ( 0.11%) 29850.62 * -7.41%* 30066.83 * -6.74%* 29824.62 * -7.49%*
Hmean send-16384 48146.75 ( 0.00%) 49297.23 * 2.39%* 48295.51 ( 0.31%) 48800.37 * 1.36%* 48247.73 ( 0.21%)
Hmean recv-64 362.16 ( 0.00%) 362.87 ( 0.19%) 318.82 * -11.97%* 347.07 * -4.17%* 333.48 * -7.92%*
Hmean recv-128 723.01 ( 0.00%) 723.66 ( 0.09%) 660.89 * -8.59%* 676.39 * -6.45%* 650.63 * -10.01%*
Hmean recv-256 1435.06 ( 0.00%) 1426.94 ( -0.57%) 1346.07 * -6.20%* 1359.45 * -5.27%* 1323.81 * -7.75%*
Hmean recv-1024 5562.68 ( 0.00%) 5529.90 * -0.59%* 5228.28 * -6.01%* 5381.37 * -3.26%* 5271.45 * -5.24%*
Hmean recv-2048 10934.36 ( 0.00%) 10809.66 * -1.14%* 10519.89 * -3.79%* 10610.28 * -2.96%* 10544.58 * -3.56%*
Hmean recv-3312 16898.65 ( 0.00%) 16538.21 * -2.13%* 16240.86 * -3.89%* 16269.34 * -3.72%* 15967.13 * -5.51%*
Hmean recv-4096 19351.99 ( 0.00%) 19183.17 ( -0.87%) 18598.33 * -3.89%* 18690.13 * -3.42%* 18407.45 * -4.88%*
Hmean recv-8192 32238.74 ( 0.00%) 32275.13 ( 0.11%) 29850.39 * -7.41%* 30062.78 * -6.75%* 29824.30 * -7.49%*
Hmean recv-16384 48146.59 ( 0.00%) 49296.23 * 2.39%* 48295.03 ( 0.31%) 48786.88 * 1.33%* 48246.71 ( 0.21%)
Here is a plot of the raw benchmark data, you can better see the distribution
and variability of the results:
https://beta.suse.com/private/ggherdovich/teo-eval/teo-eval.html#netperf-udp
> > [...]
> >
> > SOCKPERF-UDP-THROUGHPUT
> > =======================
> > NOTES: Test run in mode "throughput" over UDP. The varying parameter is the
> > message size.
> > MEASURES: Throughput, in MBits/second
> > HIGHER is better
> >
> > machine: 48x-HASWELL-NUMA
> > 4.18.0 4.18.0 4.18.0
> > vanilla teo-v1 teo-v2+backport
> > ----------------------------------------------------------------------------------
> > Hmean 14 48.16 ( 0.00%) 50.94 * 5.77%* 42.50 * -11.77%*
> > Hmean 100 346.77 ( 0.00%) 358.74 * 3.45%* 303.31 * -12.53%*
> > Hmean 300 1018.06 ( 0.00%) 1053.75 * 3.51%* 895.55 * -12.03%*
> > Hmean 500 1693.07 ( 0.00%) 1754.62 * 3.64%* 1489.61 * -12.02%*
> > Hmean 850 2853.04 ( 0.00%) 2948.73 * 3.35%* 2473.50 * -13.30%*
>
> Well, in this case the consistent improvement in v1 turned into a consistent decline
> in the v2, and over 10% for that matter. Needs improvement IMO.
Update: this one got resolved in v5,
machine: 48x-HASWELL-NUMA
4.18.0 4.18.0 4.18.0 4.18.0 4.18.0
vanilla teo teo-v2+backport teo-v3+backport teo-v5+backport
--------------------------------------------------------------------------------------------------------------------------------
Hmean 14 48.16 ( 0.00%) 50.94 * 5.77%* 42.50 * -11.77%* 48.91 * 1.55%* 49.06 * 1.87%*
Hmean 100 346.77 ( 0.00%) 358.74 * 3.45%* 303.31 * -12.53%* 350.75 ( 1.15%) 347.52 ( 0.22%)
Hmean 300 1018.06 ( 0.00%) 1053.75 * 3.51%* 895.55 * -12.03%* 1014.00 ( -0.40%) 1023.99 ( 0.58%)
Hmean 500 1693.07 ( 0.00%) 1754.62 * 3.64%* 1489.61 * -12.02%* 1688.50 ( -0.27%) 1698.43 ( 0.32%)
Hmean 850 2853.04 ( 0.00%) 2948.73 * 3.35%* 2473.50 * -13.30%* 2836.13 ( -0.59%) 2767.66 * -2.99%*
plots of raw data at
https://beta.suse.com/private/ggherdovich/teo-eval/teo-eval.html#sockperf-udp-throughput
>
> > DBENCH4
> > =======
> > NOTES: asyncronous IO; varies the number of clients up to NUMCPUS*8.
> > MEASURES: latency (millisecs)
> > LOWER is better
> >
> > machine: 48x-HASWELL-NUMA
> > 4.18.0 4.18.0 4.18.0
> > vanilla teo-v1 teo-v2+backport
> > ----------------------------------------------------------------------------------
> > Amean 1 37.15 ( 0.00%) 50.10 ( -34.86%) 39.02 ( -5.03%)
> > Amean 2 43.75 ( 0.00%) 45.50 ( -4.01%) 44.36 ( -1.39%)
> > Amean 4 54.42 ( 0.00%) 58.85 ( -8.15%) 58.17 ( -6.89%)
> > Amean 8 75.72 ( 0.00%) 74.25 ( 1.94%) 82.76 ( -9.30%)
> > Amean 16 116.56 ( 0.00%) 119.88 ( -2.85%) 164.14 ( -40.82%)
> > Amean 32 570.02 ( 0.00%) 561.92 ( 1.42%) 681.94 ( -19.63%)
> > Amean 64 3185.20 ( 0.00%) 3291.80 ( -3.35%) 4337.43 ( -36.17%)
>
> This one too.
Update:
machine: 48x-HASWELL-NUMA
4.18.0 4.18.0 4.18.0 4.18.0 4.18.0
vanilla teo teo-v2+backport teo-v3+backport teo-v5+backport
--------------------------------------------------------------------------------------------------------------------------------
Amean 1 37.15 ( 0.00%) 50.10 ( -34.86%) 39.02 ( -5.03%) 52.24 ( -40.63%) 51.62 ( -38.96%)
Amean 2 43.75 ( 0.00%) 45.50 ( -4.01%) 44.36 ( -1.39%) 47.25 ( -8.00%) 44.20 ( -1.03%)
Amean 4 54.42 ( 0.00%) 58.85 ( -8.15%) 58.17 ( -6.89%) 55.12 ( -1.29%) 58.07 ( -6.70%)
Amean 8 75.72 ( 0.00%) 74.25 ( 1.94%) 82.76 ( -9.30%) 78.63 ( -3.84%) 85.33 ( -12.68%)
Amean 16 116.56 ( 0.00%) 119.88 ( -2.85%) 164.14 ( -40.82%) 124.87 ( -7.13%) 124.54 ( -6.85%)
Amean 32 570.02 ( 0.00%) 561.92 ( 1.42%) 681.94 ( -19.63%) 568.93 ( 0.19%) 571.23 ( -0.21%)
Amean 64 3185.20 ( 0.00%) 3291.80 ( -3.35%) 4337.43 ( -36.17%) 3181.13 ( 0.13%) 3382.48 ( -6.19%)
It doesn't do well in the single-client scenario; v2 was a lot better at
that, but on the other side it suffered at saturation (64 clients on a 48
cores box). Plot at
https://beta.suse.com/private/ggherdovich/teo-eval/teo-eval.html#dbench4
>
> > TBENCH4
> > =======
> > NOTES: networking counterpart of dbench. Varies the number of clients up to NUMCPUS*4
> > MEASURES: Throughput, MB/sec
> > HIGHER is better
> >
> > machine: 8x-SKYLAKE-UMA
> > 4.18.0 4.18.0 4.18.0
> > vanilla teo teo-v2+backport
> > ----------------------------------------------------------------------------------------
> > Hmean mb/sec-1 620.52 ( 0.00%) 613.98 * -1.05%* 502.47 * -19.03%*
> > Hmean mb/sec-2 1179.05 ( 0.00%) 1112.84 * -5.62%* 820.57 * -30.40%*
> > Hmean mb/sec-4 2072.29 ( 0.00%) 2040.55 * -1.53%* 2036.11 * -1.75%*
> > Hmean mb/sec-8 4238.96 ( 0.00%) 4205.01 * -0.80%* 4124.59 * -2.70%*
> > Hmean mb/sec-16 3515.96 ( 0.00%) 3536.23 * 0.58%* 3500.02 * -0.45%*
> > Hmean mb/sec-32 3452.92 ( 0.00%) 3448.94 * -0.12%* 3428.08 * -0.72%*
> >
>
> And same here.
New data:
machine: 8x-SKYLAKE-UMA
4.18.0 4.18.0 4.18.0 4.18.0 4.18.0
vanilla teo teo-v2+backport teo-v3+backport teo-v5+backport
--------------------------------------------------------------------------------------------------------------------------------------
Hmean mb/sec-1 620.52 ( 0.00%) 613.98 * -1.05%* 502.47 * -19.03%* 492.77 * -20.59%* 464.52 * -25.14%*
Hmean mb/sec-2 1179.05 ( 0.00%) 1112.84 * -5.62%* 820.57 * -30.40%* 831.23 * -29.50%* 780.97 * -33.76%*
Hmean mb/sec-4 2072.29 ( 0.00%) 2040.55 * -1.53%* 2036.11 * -1.75%* 2016.97 * -2.67%* 2019.79 * -2.53%*
Hmean mb/sec-8 4238.96 ( 0.00%) 4205.01 * -0.80%* 4124.59 * -2.70%* 4098.06 * -3.32%* 4171.64 * -1.59%*
Hmean mb/sec-16 3515.96 ( 0.00%) 3536.23 * 0.58%* 3500.02 * -0.45%* 3438.60 * -2.20%* 3456.89 * -1.68%*
Hmean mb/sec-32 3452.92 ( 0.00%) 3448.94 * -0.12%* 3428.08 * -0.72%* 3369.30 * -2.42%* 3430.09 * -0.66%*
Again, the pain point is at low client count; v1 OTOH was neutral. Plot at
https://beta.suse.com/private/ggherdovich/teo-eval/teo-eval.html#tbench4
Cheers,
Giovanni
Powered by blists - more mailing lists