[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <803c439c1d1f435bb22a6ef6c0c2d99e@hisilicon.com>
Date: Mon, 25 Jan 2021 21:55:40 +0000
From: "Song Bao Hua (Barry Song)" <song.bao.hua@...ilicon.com>
To: Valentin Schneider <valentin.schneider@....com>,
Vincent Guittot <vincent.guittot@...aro.org>,
Mel Gorman <mgorman@...e.de>
CC: Ingo Molnar <mingo@...nel.org>,
Peter Zijlstra <peterz@...radead.org>,
Dietmar Eggemann <dietmar.eggemann@....com>,
Morten Rasmussen <morten.rasmussen@....com>,
linux-kernel <linux-kernel@...r.kernel.org>,
"linuxarm@...neuler.org" <linuxarm@...neuler.org>
Subject: RE: [RFC PATCH] sched/fair: first try to fix the scheduling impact of
NUMA diameter > 2
> -----Original Message-----
> From: Valentin Schneider [mailto:valentin.schneider@....com]
> Sent: Tuesday, January 26, 2021 1:11 AM
> To: Song Bao Hua (Barry Song) <song.bao.hua@...ilicon.com>; Vincent Guittot
> <vincent.guittot@...aro.org>; Mel Gorman <mgorman@...e.de>
> Cc: Ingo Molnar <mingo@...nel.org>; Peter Zijlstra <peterz@...radead.org>;
> Dietmar Eggemann <dietmar.eggemann@....com>; Morten Rasmussen
> <morten.rasmussen@....com>; linux-kernel <linux-kernel@...r.kernel.org>;
> linuxarm@...neuler.org
> Subject: RE: [RFC PATCH] sched/fair: first try to fix the scheduling impact
> of NUMA diameter > 2
>
> On 25/01/21 03:13, Song Bao Hua (Barry Song) wrote:
> > As long as NUMA diameter > 2, building sched_domain by sibling's child domain
> > will definitely create a sched_domain with sched_group which will span
> > out of the sched_domain
> > +------+ +------+ +-------+ +------+
> > | node | 12 |node | 20 | node | 12 |node |
> > | 0 +---------+1 +--------+ 2 +-------+3 |
> > +------+ +------+ +-------+ +------+
> >
> > domain0 node0 node1 node2 node3
> >
> > domain1 node0+1 node0+1 node2+3 node2+3
> > +
> > domain2 node0+1+2 |
> > group: node0+1 |
> > group:node2+3 <-------------------+
> >
> > when node2 is added into the domain2 of node0, kernel is using the child
> > domain of node2's domain2, which is domain1(node2+3). Node 3 is outside
> > the span of node0+1+2.
> >
> > Will we move to use the *child* domain of the *child* domain of node2's
> > domain2 to build the sched_group?
> >
> > I mean:
> > +------+ +------+ +-------+ +------+
> > | node | 12 |node | 20 | node | 12 |node |
> > | 0 +---------+1 +--------+ 2 +-------+3 |
> > +------+ +------+ +-------+ +------+
> >
> > domain0 node0 node1 +- node2 node3
> > |
> > domain1 node0+1 node0+1 | node2+3 node2+3
> > |
> > domain2 node0+1+2 |
> > group: node0+1 |
> > group:node2 <-------------------+
> >
> > In this way, it seems we don't have to create a new group as we are just
> > reusing the existing group?
> >
>
> One thing I've been musing over is pretty much this; that is to say we
> would make all non-local NUMA sched_groups span a single node. This would
> let us reuse an existing span+sched_group_capacity: the local group of that
> node at its first NUMA topology level.
>
> Essentially this means getting rid of the overlapping groups, and the
> balance mask is handled the same way as for !NUMA, i.e. it's the local
> group span. I've not gone far enough through the thought experiment to see
> where does it miserably fall apart... It is at the very least violating the
> expectation that a group span is a child domain's span - here it can be a
> grand^x children domain's span.
>
>
> If we take your topology, we currently have:
>
> | tl\node | 0 | 1 | 2 | 3 |
> |---------+--------------+---------------+---------------+--------------|
> | NUMA0 | (0)->(1) | (1)->(2)->(0) | (2)->(3)->(1) | (3)->(2) |
> | NUMA1 | (0-1)->(1-3) | (0-2)->(2-3) | (1-3)->(0-1) | (2-3)->(0-2) |
> | NUMA2 | (0-2)->(1-3) | N/A | N/A | (1-3)->(0-2) |
>
> With the current overlapping group scheme, we would need to make it look
> like so:
>
> | tl\node | 0 | 1 | 2 | 3 |
> |---------+---------------+---------------+---------------+---------------
> |
> | NUMA0 | (0)->(1) | (1)->(2)->(0) | (2)->(3)->(1) | (3)->(2) |
> | NUMA1 | (0-1)->(1-2)* | (0-2)->(2-3) | (1-3)->(0-1) | (2-3)->(1-2)* |
> | NUMA2 | (0-2)->(1-3) | N/A | N/A | (1-3)->(0-2) |
>
> But as already discussed, that's tricky to make work. With the node-span
> groups thing, we would turn this into:
>
> | tl\node | 0 | 1 | 2 | 3 |
> |---------+------------+---------------+---------------+------------|
> | NUMA0 | (0)->(1) | (1)->(2)->(0) | (2)->(3)->(1) | (3)->(2) |
> | NUMA1 | (0-1)->(2) | (0-2)->(3) | (1-3)->(0) | (2-3)->(1) |
> | NUMA2 | (0-2)->(3) | N/A | N/A | (1-3)->(0) |
Actually I didn't mean going that far. What I was thinking is that
we only fix the sched_domain while sched_group isn't a subset of
sched_domain. For those sched_domains which haven't the group span
issue, we just don't touch it. For NUMA1, we change like your diagram,
but NUMA2 won't be changed. The concept is like:
--- a/kernel/sched/topology.c
+++ b/kernel/sched/topology.c
@@ -1040,6 +1040,19 @@ build_overlap_sched_groups(struct sched_domain
*sd, int cpu)
}
sg_span = sched_group_span(sg);
+#if 1
+ if (sibling->child && !cpumask_subset(sg_span, span)) {
+ sg = build_group_from_child_sched_domain(sibling->child, cpu);
+ ...
+ sg_span = sched_group_span(sg);
+ }
+#endif
cpumask_or(covered, covered, sg_span);
Thanks
Barry
Powered by blists - more mailing lists