[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <cec31f9f-0eda-706e-235d-5bd2bfad6c2c@linux.intel.com>
Date: Mon, 1 Feb 2021 09:13:16 +0800
From: "Li, Aubrey" <aubrey.li@...ux.intel.com>
To: Mel Gorman <mgorman@...hsingularity.net>,
Peter Zijlstra <peterz@...radead.org>,
Ingo Molnar <mingo@...hat.com>
Cc: Vincent Guittot <vincent.guittot@...aro.org>,
Qais Yousef <qais.yousef@....com>,
LKML <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH v5 0/4] Scan for an idle sibling in a single pass
On 2021/1/27 21:51, Mel Gorman wrote:
> Changelog since v4
> o Avoid use of intermediate variable during select_idle_cpu
>
> Changelog since v3
> o Drop scanning based on cores, SMT4 results showed problems
>
> Changelog since v2
> o Remove unnecessary parameters
> o Update nr during scan only when scanning for cpus
>
> Changlog since v1
> o Move extern declaration to header for coding style
> o Remove unnecessary parameter from __select_idle_cpu
>
> This series of 4 patches reposts three patches from Peter entitled
> "select_idle_sibling() wreckage". It only scans the runqueues in a single
> pass when searching for an idle sibling.
>
> Three patches from Peter were dropped. The first patch altered how scan
> depth was calculated. Scan depth deletion is a random number generator
> with two major limitations. The avg_idle time is based on the time
> between a CPU going idle and being woken up clamped approximately by
> 2*sysctl_sched_migration_cost. This is difficult to compare in a sensible
> fashion to avg_scan_cost. The second issue is that only the avg_scan_cost
> of scan failures is recorded and it does not decay. This requires deeper
> surgery that would justify a patch on its own although Peter notes that
> https://lkml.kernel.org/r/20180530143105.977759909@infradead.org is
> potentially useful for an alternative avg_idle metric.
>
> The second patch dropped scanned based on cores instead of CPUs as it
> rationalised the difference between core scanning and CPU scanning.
> Unfortunately, Vincent reported problems with SMT4 so it's dropped
> for now until depth searching can be fixed.
>
> The third patch dropped converted the idle core scan throttling mechanism
> to SIS_PROP. While this would unify the throttling of core and CPU
> scanning, it was not free of regressions and has_idle_cores is a fairly
> effective throttling mechanism with the caveat that it can have a lot of
> false positives for workloads like hackbench.
>
> Peter's series tried to solve three problems at once, this subset addresses
> one problem.
>
> kernel/sched/fair.c | 151 +++++++++++++++++++---------------------
> kernel/sched/features.h | 1 -
> 2 files changed, 70 insertions(+), 82 deletions(-)
>
4 benchmarks measured on a x86 4s system with 24 cores per socket and
2 HTs per core, total 192 CPUs.
The load level is [25%, 50%, 75%, 100%].
- hackbench almost has a universal win.
- netperf high load has notable changes, as well as tbench 50% load.
Details below:
hackbench: 10 iterations, 10000 loops, 40 fds per group
======================================================
- pipe process
group base %std v5 %std
3 1 19.18 1.0266 9.06
6 1 9.17 0.987 13.03
9 1 7.11 1.0195 4.61
12 1 1.07 0.9927 1.43
- pipe thread
group base %std v5 %std
3 1 11.14 0.9742 7.27
6 1 9.15 0.9572 7.48
9 1 2.95 0.986 4.05
12 1 1.75 0.9992 1.68
- socket process
group base %std v5 %std
3 1 2.9 0.9586 2.39
6 1 0.68 0.9641 1.3
9 1 0.64 0.9388 0.76
12 1 0.56 0.9375 0.55
- socket thread
group base %std v5 %std
3 1 3.82 0.9686 2.97
6 1 2.06 0.9667 1.91
9 1 0.44 0.9354 1.25
12 1 0.54 0.9362 0.6
netperf: 10 iterations x 100 seconds, transactions rate / sec
=============================================================
- tcp request/response performance
thread base %std v4 %std
25% 1 5.34 1.0039 5.13
50% 1 4.97 1.0115 6.3
75% 1 5.09 0.9257 6.75
100% 1 4.53 0.908 4.83
- udp request/response performance
thread base %std v4 %std
25% 1 6.18 0.9896 6.09
50% 1 5.88 1.0198 8.92
75% 1 24.38 0.9236 29.14
100% 1 26.16 0.9063 22.16
tbench: 10 iterations x 100 seconds, throughput / sec
=====================================================
thread base %std v4 %std
25% 1 0.45 1.003 1.48
50% 1 1.71 0.9286 0.82
75% 1 0.84 0.9928 0.94
100% 1 0.76 0.9762 0.59
schbench: 10 iterations x 100 seconds, 99th percentile latency
==============================================================
mthread base %std v4 %std
25% 1 2.89 0.9884 7.34
50% 1 40.38 1.0055 38.37
75% 1 4.76 1.0095 4.62
100% 1 10.09 1.0083 8.03
Thanks,
-Aubrey
Powered by blists - more mailing lists