[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <9d6c6d3ba6ac4272bf844034da4653fe@hisilicon.com>
Date: Fri, 22 Jan 2021 11:09:50 +0000
From: "Song Bao Hua (Barry Song)" <song.bao.hua@...ilicon.com>
To: Dietmar Eggemann <dietmar.eggemann@....com>,
Valentin Schneider <valentin.schneider@....com>,
Meelis Roos <mroos@...ux.ee>,
LKML <linux-kernel@...r.kernel.org>
CC: Peter Zijlstra <peterz@...radead.org>,
Vincent Guittot <vincent.guittot@...aro.org>,
Mel Gorman <mgorman@...e.de>
Subject: RE: 5.11-rc4+git: Shortest NUMA path spans too many nodes
> -----Original Message-----
> From: Dietmar Eggemann [mailto:dietmar.eggemann@....com]
> Sent: Friday, January 22, 2021 11:05 PM
> To: Song Bao Hua (Barry Song) <song.bao.hua@...ilicon.com>; Valentin Schneider
> <valentin.schneider@....com>; Meelis Roos <mroos@...ux.ee>; LKML
> <linux-kernel@...r.kernel.org>
> Cc: Peter Zijlstra <peterz@...radead.org>; Vincent Guittot
> <vincent.guittot@...aro.org>; Mel Gorman <mgorman@...e.de>
> Subject: Re: 5.11-rc4+git: Shortest NUMA path spans too many nodes
>
> On 21/01/2021 22:17, Song Bao Hua (Barry Song) wrote:
> >
> >
> >> -----Original Message-----
> >> From: Dietmar Eggemann [mailto:dietmar.eggemann@....com]
> >> Sent: Friday, January 22, 2021 7:54 AM
> >> To: Valentin Schneider <valentin.schneider@....com>; Meelis Roos
> >> <mroos@...ux.ee>; LKML <linux-kernel@...r.kernel.org>
> >> Cc: Peter Zijlstra <peterz@...radead.org>; Vincent Guittot
> >> <vincent.guittot@...aro.org>; Song Bao Hua (Barry Song)
> >> <song.bao.hua@...ilicon.com>; Mel Gorman <mgorman@...e.de>
> >> Subject: Re: 5.11-rc4+git: Shortest NUMA path spans too many nodes
> >>
> >> On 21/01/2021 19:21, Valentin Schneider wrote:
> >>> On 21/01/21 19:39, Meelis Roos wrote:
>
> [...]
>
> >> # cat /sys/devices/system/node/node*/distance
> >> 10 12 12 14 14 14 14 16
> >> 12 10 14 12 14 14 12 14
> >> 12 14 10 14 12 12 14 14
> >> 14 12 14 10 12 12 14 14
> >> 14 14 12 12 10 14 12 14
> >> 14 14 12 12 14 10 14 12
> >> 14 12 14 14 12 14 10 12
> >> 16 14 14 14 14 12 12 10
> >>
> >> The '16' seems to be the culprit. How does such a topo look like?
>
> Maybe like this:
>
> _________
> | |
> .-6 0 4-.
> | \ / \ / |
> | 1 2 |
> | \ \ |
> --7 3----5 |
> | |____|_|
> |_______|
>
> >
> > Once we get a topology like this:
> >
> >
> > +------+ +------+ +-------+ +------+
> > | node | |node | | node | |node |
> > | +---------+ +--------+ +-------+ |
> > +------+ +------+ +-------+ +------+
> >
> > We can reproduce this issue.
> > For example, every cpu with the below numa_distance can have
> > "groups don't span domain->span":
> > node 0 1 2 3
> > 0: 10 12 20 22
> > 1: 12 10 22 24
> > 2: 20 22 10 12
> > 3: 22 24 12 10
> 2 20 2
> So this should look like: 1 --- 0 ---- 2 --- 3
Yes. So here we are facing another problem:
kernel/sched/topology.c has an assumption that:
node_distance(0,j) includes all distances in
node_distance(i,j).
void sched_init_numa(void)
{
...
*
* Assumes node_distance(0,j) includes all distances in
* node_distance(i,j) in order to avoid cubic time.
*/
next_distance = curr_distance;
for (i = 0; i < nr_node_ids; i++) {
for (j = 0; j < nr_node_ids; j++) {
for (k = 0; k < nr_node_ids; k++)
}
but obviously we are not this case. Right now, we are getting
some performance decrease due to this, probably I'll start another
thread for it.
Thanks
Barry
Powered by blists - more mailing lists