lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20170517105350.hk5m4h4jb6dfr65a@hirez.programming.kicks-ass.net>
Date:   Wed, 17 May 2017 12:53:50 +0200
From:   Peter Zijlstra <peterz@...radead.org>
To:     mingo@...nel.org, tglx@...utronix.de, riel@...hat.com,
        hpa@...or.com, efault@....de, linux-kernel@...r.kernel.org,
        torvalds@...ux-foundation.org, lvenanci@...hat.com
Cc:     xiaolong.ye@...el.com, kitsunyan@...ox.ru, clm@...com,
        matt@...eblueprint.co.uk
Subject: hackbench vs select_idle_sibling; was: [tip:sched/core] sched/fair,
 cpumask: Export for_each_cpu_wrap()

On Mon, May 15, 2017 at 02:03:11AM -0700, tip-bot for Peter Zijlstra wrote:
> sched/fair, cpumask: Export for_each_cpu_wrap()

> -static int cpumask_next_wrap(int n, const struct cpumask *mask, int start, int *wrapped)
> -{

> -	next = find_next_bit(cpumask_bits(mask), nr_cpumask_bits, n+1);

> -}

OK, so this patch fixed an actual bug in the for_each_cpu_wrap()
implementation. The above 'n+1' should be 'n', and the effect is that
it'll skip over CPUs, potentially resulting in an iteration that only
sees every other CPU (for a fully contiguous mask).

This in turn causes hackbench to further suffer from the regression
introduced by commit:

  4c77b18cf8b7 ("sched/fair: Make select_idle_cpu() more aggressive")

So its well past time to fix this.

Where the old scheme was a cliff-edge throttle on idle scanning, this
introduces a more gradual approach. Instead of stopping to scan
entirely, we limit how many CPUs we scan.

Initial benchmarks show that it mostly recovers hackbench while not
hurting anything else, except Mason's schbench, but not as bad as the
old thing.

It also appears to recover the tbench high-end, which also suffered like
hackbench.

I'm also hoping it will fix/preserve kitsunyan's interactivity issue.

Please test..

Cc: xiaolong.ye@...el.com
Cc: kitsunyan <kitsunyan@...ox.ru>
Cc: Chris Mason <clm@...com>
Cc: Mike Galbraith <efault@....de>
Cc: Matt Fleming <matt@...eblueprint.co.uk>
---
 kernel/sched/fair.c     | 21 ++++++++++++++++-----
 kernel/sched/features.h |  1 +
 2 files changed, 17 insertions(+), 5 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 219fe58..8c1afa0 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -5793,27 +5793,38 @@ static inline int select_idle_smt(struct task_struct *p, struct sched_domain *sd
 static int select_idle_cpu(struct task_struct *p, struct sched_domain *sd, int target)
 {
 	struct sched_domain *this_sd;
-	u64 avg_cost, avg_idle = this_rq()->avg_idle;
+	u64 avg_cost, avg_idle;
 	u64 time, cost;
 	s64 delta;
-	int cpu;
+	int cpu, nr = INT_MAX;
 
 	this_sd = rcu_dereference(*this_cpu_ptr(&sd_llc));
 	if (!this_sd)
 		return -1;
 
-	avg_cost = this_sd->avg_scan_cost;
-
 	/*
 	 * Due to large variance we need a large fuzz factor; hackbench in
 	 * particularly is sensitive here.
 	 */
-	if (sched_feat(SIS_AVG_CPU) && (avg_idle / 512) < avg_cost)
+	avg_idle = this_rq()->avg_idle / 512;
+	avg_cost = this_sd->avg_scan_cost + 1;
+
+	if (sched_feat(SIS_AVG_CPU) && avg_idle < avg_cost)
 		return -1;
 
+	if (sched_feat(SIS_PROP)) {
+		u64 span_avg = sd->span_weight * avg_idle;
+		if (span_avg > 4*avg_cost)
+			nr = div_u64(span_avg, avg_cost);
+		else
+			nr = 4;
+	}
+
 	time = local_clock();
 
 	for_each_cpu_wrap(cpu, sched_domain_span(sd), target) {
+		if (!--nr)
+			return -1;
 		if (!cpumask_test_cpu(cpu, &p->cpus_allowed))
 			continue;
 		if (idle_cpu(cpu))
diff --git a/kernel/sched/features.h b/kernel/sched/features.h
index dc4d148..d3fb155 100644
--- a/kernel/sched/features.h
+++ b/kernel/sched/features.h
@@ -55,6 +55,7 @@ SCHED_FEAT(TTWU_QUEUE, true)
  * When doing wakeups, attempt to limit superfluous scans of the LLC domain.
  */
 SCHED_FEAT(SIS_AVG_CPU, false)
+SCHED_FEAT(SIS_PROP, true)
 
 /*
  * Issue a WARN when we do multiple update_rq_clock() calls

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ