linux-kernel - Re: [PATCH v5 0/4] Scan for an idle sibling in a single pass

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <cec31f9f-0eda-706e-235d-5bd2bfad6c2c@linux.intel.com>
Date:   Mon, 1 Feb 2021 09:13:16 +0800
From:   "Li, Aubrey" <aubrey.li@...ux.intel.com>
To:     Mel Gorman <mgorman@...hsingularity.net>,
        Peter Zijlstra <peterz@...radead.org>,
        Ingo Molnar <mingo@...hat.com>
Cc:     Vincent Guittot <vincent.guittot@...aro.org>,
        Qais Yousef <qais.yousef@....com>,
        LKML <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH v5 0/4] Scan for an idle sibling in a single pass

On 2021/1/27 21:51, Mel Gorman wrote:
> Changelog since v4
> o Avoid use of intermediate variable during select_idle_cpu
> 
> Changelog since v3
> o Drop scanning based on cores, SMT4 results showed problems
> 
> Changelog since v2
> o Remove unnecessary parameters
> o Update nr during scan only when scanning for cpus
> 
> Changlog since v1
> o Move extern declaration to header for coding style
> o Remove unnecessary parameter from __select_idle_cpu
> 
> This series of 4 patches reposts three patches from Peter entitled
> "select_idle_sibling() wreckage". It only scans the runqueues in a single
> pass when searching for an idle sibling.
> 
> Three patches from Peter were dropped. The first patch altered how scan
> depth was calculated. Scan depth deletion is a random number generator
> with two major limitations. The avg_idle time is based on the time
> between a CPU going idle and being woken up clamped approximately by
> 2*sysctl_sched_migration_cost.  This is difficult to compare in a sensible
> fashion to avg_scan_cost. The second issue is that only the avg_scan_cost
> of scan failures is recorded and it does not decay.  This requires deeper
> surgery that would justify a patch on its own although Peter notes that
> https://lkml.kernel.org/r/20180530143105.977759909@infradead.org is
> potentially useful for an alternative avg_idle metric.
> 
> The second patch dropped scanned based on cores instead of CPUs as it
> rationalised the difference between core scanning and CPU scanning.
> Unfortunately, Vincent reported problems with SMT4 so it's dropped
> for now until depth searching can be fixed.
> 
> The third patch dropped converted the idle core scan throttling mechanism
> to SIS_PROP. While this would unify the throttling of core and CPU
> scanning, it was not free of regressions and has_idle_cores is a fairly
> effective throttling mechanism with the caveat that it can have a lot of
> false positives for workloads like hackbench.
> 
> Peter's series tried to solve three problems at once, this subset addresses
> one problem.
> 
>  kernel/sched/fair.c     | 151 +++++++++++++++++++---------------------
>  kernel/sched/features.h |   1 -
>  2 files changed, 70 insertions(+), 82 deletions(-)
> 

4 benchmarks measured on a x86 4s system with 24 cores per socket and
2 HTs per core, total 192 CPUs. 

The load level is [25%, 50%, 75%, 100%].

- hackbench almost has a universal win.
- netperf high load has notable changes, as well as tbench 50% load.

Details below:

hackbench: 10 iterations, 10000 loops, 40 fds per group
======================================================

- pipe process

	group	base	%std	v5	%std
	3	1	19.18	1.0266	9.06
	6	1	9.17	0.987	13.03
	9	1	7.11	1.0195	4.61
	12	1	1.07	0.9927	1.43

- pipe thread

	group	base	%std	v5	%std
	3	1	11.14	0.9742	7.27
	6	1	9.15	0.9572	7.48
	9	1	2.95	0.986	4.05
	12	1	1.75	0.9992	1.68

- socket process

	group	base	%std	v5	%std
	3	1	2.9	0.9586	2.39
	6	1	0.68	0.9641	1.3
	9	1	0.64	0.9388	0.76
	12	1	0.56	0.9375	0.55

- socket thread

	group	base	%std	v5	%std
	3	1	3.82	0.9686	2.97
	6	1	2.06	0.9667	1.91
	9	1	0.44	0.9354	1.25
	12	1	0.54	0.9362	0.6

netperf: 10 iterations x 100 seconds, transactions rate / sec
=============================================================

- tcp request/response performance

	thread	base	%std	v4	%std
	25%	1	5.34	1.0039	5.13
	50%	1	4.97	1.0115	6.3
	75%	1	5.09	0.9257	6.75
	100%	1	4.53	0.908	4.83



- udp request/response performance

	thread	base	%std	v4	%std
	25%	1	6.18	0.9896	6.09
	50%	1	5.88	1.0198	8.92
	75%	1	24.38	0.9236	29.14
	100%	1	26.16	0.9063	22.16

tbench: 10 iterations x 100 seconds, throughput / sec
=====================================================

	thread	base	%std	v4	%std
	25%	1	0.45	1.003	1.48
	50%	1	1.71	0.9286	0.82
	75%	1	0.84	0.9928	0.94
	100%	1	0.76	0.9762	0.59

schbench: 10 iterations x 100 seconds, 99th percentile latency
==============================================================

	mthread	base	%std	v4	%std
	25%	1	2.89	0.9884	7.34
	50%	1	40.38	1.0055	38.37
	75%	1	4.76	1.0095	4.62
	100%	1	10.09	1.0083	8.03

Thanks,
-Aubrey