[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <lsq.1507553064.544983171@decadent.org.uk>
Date: Mon, 09 Oct 2017 13:44:24 +0100
From: Ben Hutchings <ben@...adent.org.uk>
To: linux-kernel@...r.kernel.org, stable@...r.kernel.org
CC: akpm@...ux-foundation.org,
"Linus Torvalds" <torvalds@...ux-foundation.org>,
"Ingo Molnar" <mingo@...nel.org>,
"Thomas Gleixner" <tglx@...utronix.de>,
"Mike Galbraith" <efault@....de>,
"Peter Zijlstra" <peterz@...radead.org>
Subject: [PATCH 3.16 005/192] sched/topology: Fix overlapping sched_group_mask
3.16.49-rc1 review patch. If anyone has any objections, please let me know.
------------------
From: Peter Zijlstra <peterz@...radead.org>
commit 73bb059f9b8a00c5e1bf2f7ca83138c05d05e600 upstream.
The point of sched_group_mask is to select those CPUs from
sched_group_cpus that can actually arrive at this balance domain.
The current code gets it wrong, as can be readily demonstrated with a
topology like:
node 0 1 2 3
0: 10 20 30 20
1: 20 10 20 30
2: 30 20 10 20
3: 20 30 20 10
Where (for example) domain 1 on CPU1 ends up with a mask that includes
CPU0:
[] CPU1 attaching sched-domain:
[] domain 0: span 0-2 level NUMA
[] groups: 1 (mask: 1), 2, 0
[] domain 1: span 0-3 level NUMA
[] groups: 0-2 (mask: 0-2) (cpu_capacity: 3072), 0,2-3 (cpu_capacity: 3072)
This causes sched_balance_cpu() to compute the wrong CPU and
consequently should_we_balance() will terminate early resulting in
missed load-balance opportunities.
The fixed topology looks like:
[] CPU1 attaching sched-domain:
[] domain 0: span 0-2 level NUMA
[] groups: 1 (mask: 1), 2, 0
[] domain 1: span 0-3 level NUMA
[] groups: 0-2 (mask: 1) (cpu_capacity: 3072), 0,2-3 (cpu_capacity: 3072)
(note: this relies on OVERLAP domains to always have children, this is
true because the regular topology domains are still here -- this is
before degenerate trimming)
Debugged-by: Lauro Ramos Venancio <lvenanci@...hat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@...radead.org>
Cc: Linus Torvalds <torvalds@...ux-foundation.org>
Cc: Mike Galbraith <efault@....de>
Cc: Peter Zijlstra <peterz@...radead.org>
Cc: Thomas Gleixner <tglx@...utronix.de>
Cc: linux-kernel@...r.kernel.org
Fixes: e3589f6c81e4 ("sched: Allow for overlapping sched_domain spans")
Signed-off-by: Ingo Molnar <mingo@...nel.org>
[bwh: Backported to 3.16:
- Use span, not sg_span
- Adjust filename context]
Signed-off-by: Ben Hutchings <ben@...adent.org.uk>
---
kernel/sched/core.c | 18 +++++++++++++++++-
1 file changed, 17 insertions(+), 1 deletion(-)
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -5775,6 +5775,8 @@ enum s_alloc {
* and our sibling sd spans will be empty. Domains should always include the
* cpu they're built on, so check that.
*
+ * Only CPUs that can arrive at this group should be considered to continue
+ * balancing.
*/
static void build_group_mask(struct sched_domain *sd, struct sched_group *sg)
{
@@ -5785,11 +5787,24 @@ static void build_group_mask(struct sche
for_each_cpu(i, span) {
sibling = *per_cpu_ptr(sdd->sd, i);
- if (!cpumask_test_cpu(i, sched_domain_span(sibling)))
+
+ /*
+ * Can happen in the asymmetric case, where these siblings are
+ * unused. The mask will not be empty because those CPUs that
+ * do have the top domain _should_ span the domain.
+ */
+ if (!sibling->child)
+ continue;
+
+ /* If we would not end up here, we can't continue from here */
+ if (!cpumask_equal(span, sched_domain_span(sibling->child)))
continue;
cpumask_set_cpu(i, sched_group_mask(sg));
}
+
+ /* We must not have empty masks here */
+ WARN_ON_ONCE(cpumask_empty(sched_group_mask(sg)));
}
/*
Powered by blists - more mailing lists