lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20210617094040.GM30378@techsingularity.net>
Date:   Thu, 17 Jun 2021 10:40:40 +0100
From:   Mel Gorman <mgorman@...hsingularity.net>
To:     Vincent Guittot <vincent.guittot@...aro.org>
Cc:     Peter Zijlstra <peterz@...radead.org>,
        Ingo Molnar <mingo@...hat.com>,
        Juri Lelli <juri.lelli@...hat.com>,
        Valentin Schneider <valentin.schneider@....com>,
        LKML <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH v2] sched/fair: Age the average idle time

On Thu, Jun 17, 2021 at 10:30:09AM +0200, Vincent Guittot wrote:
> > > > I'm tempted to give it a go.. anyone object?
> > >
> > > Just finished running some tests on my large arm64 system.
> > > Tbench tests are a mixed between small gain and loss
> > >
> >
> > Same for tbench on three x86 machines I reran tests for
> >
> > <SNIP>
> >
> > For your arm machine, how many logical CPUs are online, what is the level
> > of SMT if any and is the machine NUMA?
> 
> It's a SMT4 x 28 cores x 2 NUMA nodes = 224 CPUs
> 

Ok, SMT4 is what I was interested in. I suspected this was the case but
was not sure. I wondered about the possibility that SMT4 should be
accounted for in the scan depth.

> >
> > Fundamentally though, as the changelog notes "due to the nature of the
> > patch, this is a regression magnet". There are going to be examples
> > where a deep search is better even if a machine is fully busy or
> > overloaded and examples where cutting off the search is better. I think
> > it's better to have an idle estimate that gets updated if CPUs are fully
> > busy even if it's not a universal win.
> 
> Although I agree that using a stall average idle time value of local
> is not good, I'm not sure this proposal is better. The main problem is
> that we use the avg_idle of the local CPU to estimate how many times
> we should loop and try to find another idle CPU. But there is no
> direct relation between both.

This is true. The idle time of the local CPU is used to estimate the
idle time of the domain which is inevitably going to be inaccurate but
tracking idle time for the domain will be cache write intensive and
potentially very expensive. I think this was discussed before but maybe
it is my imaginaction.

> Typically, a short average idle time on
> the local CPU doesn't mean that there are less idle CPUs and that's
> why we have a mix a gain and loss
> 

Can you evaluate if scanning proportional to cores helps if applied on
top? The patch below is a bit of pick&mix and has only seen a basic build
test with a distro config. While I will queue this, I don't expect it to
have an impact on SMT2.

--8--
sched/fair: Make select_idle_cpu() proportional to cores

From: Peter Zijlstra (Intel) <peterz@...radead.org>

Instead of calculating how many (logical) CPUs to scan, compute how
many cores to scan.

This changes behaviour for anything !SMT2.

Signed-off-by: Peter Zijlstra (Intel) <peterz@...radead.org>
Signed-off-by: Mel Gorman <mgorman@...hsingularity.net>
---
 kernel/sched/core.c  | 17 ++++++++++++-----
 kernel/sched/fair.c  | 11 +++++++++--
 kernel/sched/sched.h |  2 ++
 3 files changed, 23 insertions(+), 7 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 6a3fdb9f4380..1773e0707a5d 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -7846,11 +7846,18 @@ int sched_cpu_activate(unsigned int cpu)
 	balance_push_set(cpu, false);
 
 #ifdef CONFIG_SCHED_SMT
-	/*
-	 * When going up, increment the number of cores with SMT present.
-	 */
-	if (cpumask_weight(cpu_smt_mask(cpu)) == 2)
-		static_branch_inc_cpuslocked(&sched_smt_present);
+	do {
+		int weight = cpumask_weight(cpu_smt_mask(cpu));
+
+		if (weight > sched_smt_weight)
+			sched_smt_weight = weight;
+
+		/*
+		 * When going up, increment the number of cores with SMT present.
+		 */
+		if (cpumask_weight(cpu_smt_mask(cpu)) == 2)
+			static_branch_inc_cpuslocked(&sched_smt_present);
+	} while (0);
 #endif
 	set_cpu_active(cpu, true);
 
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index cc7d1144a356..4fc4e1f2eaae 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -6037,6 +6037,8 @@ static inline int __select_idle_cpu(int cpu)
 DEFINE_STATIC_KEY_FALSE(sched_smt_present);
 EXPORT_SYMBOL_GPL(sched_smt_present);
 
+int __read_mostly sched_smt_weight = 1;
+
 static inline void set_idle_cores(int cpu, int val)
 {
 	struct sched_domain_shared *sds;
@@ -6151,6 +6153,8 @@ static inline bool test_idle_cores(int cpu, bool def)
 	return def;
 }
 
+#define sched_smt_weight 1
+
 static inline int select_idle_core(struct task_struct *p, int core, struct cpumask *cpus, int *idle_cpu)
 {
 	return __select_idle_cpu(core);
@@ -6163,6 +6167,8 @@ static inline int select_idle_smt(struct task_struct *p, struct sched_domain *sd
 
 #endif /* CONFIG_SCHED_SMT */
 
+#define sis_min_cores	2
+
 /*
  * Scan the LLC domain for idle CPUs; this is dynamically regulated by
  * comparing the average scan cost (tracked in sd->avg_scan_cost) against the
@@ -6203,11 +6209,12 @@ static int select_idle_cpu(struct task_struct *p, struct sched_domain *sd, bool
 		avg_cost = this_sd->avg_scan_cost + 1;
 
 		span_avg = sd->span_weight * avg_idle;
-		if (span_avg > 4*avg_cost)
+		if (span_avg > sis_min_cores * avg_cost)
 			nr = div_u64(span_avg, avg_cost);
 		else
-			nr = 4;
+			nr = sis_min_cores;
 
+		nr *= sched_smt_weight;
 		time = cpu_clock(this);
 	}
 
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 7bc20e5541cf..440a2bbc19d5 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -1119,6 +1119,8 @@ static inline bool is_migration_disabled(struct task_struct *p)
 #ifdef CONFIG_SCHED_SMT
 extern void __update_idle_core(struct rq *rq);
 
+extern int sched_smt_weight;
+
 static inline void update_idle_core(struct rq *rq)
 {
 	if (static_branch_unlikely(&sched_smt_present))

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ