lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Fri, 29 Jan 2021 02:02:58 +0000
From:   "Song Bao Hua (Barry Song)" <song.bao.hua@...ilicon.com>
To:     Valentin Schneider <valentin.schneider@....com>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
CC:     "mingo@...nel.org" <mingo@...nel.org>,
        "peterz@...radead.org" <peterz@...radead.org>,
        "vincent.guittot@...aro.org" <vincent.guittot@...aro.org>,
        "dietmar.eggemann@....com" <dietmar.eggemann@....com>,
        "morten.rasmussen@....com" <morten.rasmussen@....com>,
        "mgorman@...e.de" <mgorman@...e.de>
Subject: RE: [PATCH 1/1] sched/topology: Make sched_init_numa() use a set for
 the deduplicating sort



> -----Original Message-----
> From: Valentin Schneider [mailto:valentin.schneider@....com]
> Sent: Friday, January 29, 2021 3:47 AM
> To: Song Bao Hua (Barry Song) <song.bao.hua@...ilicon.com>;
> linux-kernel@...r.kernel.org
> Cc: mingo@...nel.org; peterz@...radead.org; vincent.guittot@...aro.org;
> dietmar.eggemann@....com; morten.rasmussen@....com; mgorman@...e.de
> Subject: RE: [PATCH 1/1] sched/topology: Make sched_init_numa() use a set
> for the deduplicating sort
> 
> On 25/01/21 21:35, Song Bao Hua (Barry Song) wrote:
> > I was using 5.11-rc1. One thing I'd like to mention is that:
> >
> > For the below topology:
> > +-------+          +-----+
> > | node1 |  20      |node2|
> > |       +----------+     |
> > +---+---+          +-----+
> >     |                  |12
> > 12  |                  |
> > +---+---+          +---+-+
> > |       |          |node3|
> > | node0 |          |     |
> > +-------+          +-----+
> >
> > with node0-node2 as 22, node0-node3 as 24, node1-node3 as 22.
> >
> > I will get the below sched_domains_numa_distance[]:
> > 10, 12, 22, 24
> > As you can see there is *no* 20. So the node1 and node2 will
> > only get two-level numa sched_domain:
> >
> 
> 
> So that's
> 
>     -numa node,cpus=0-1,nodeid=0 -numa node,cpus=2-3,nodeid=1, \
>     -numa node,cpus=4-5,nodeid=2, -numa node,cpus=6-7,nodeid=3, \
>     -numa dist,src=0,dst=1,val=12, \
>     -numa dist,src=0,dst=2,val=22, \
>     -numa dist,src=0,dst=3,val=24, \
>     -numa dist,src=1,dst=2,val=20, \
>     -numa dist,src=1,dst=3,val=22, \
>     -numa dist,src=2,dst=3,val=12
> 
> but running this still doesn't get me a splat. Debugging
> sched_domains_numa_distance[] still gives me
> {10, 12, 20, 22, 24}
> 
> >
> > But for the below topology:
> > +-------+          +-----+
> > | node0 |  20      |node2|
> > |       +----------+     |
> > +---+---+          +-----+
> >     |                  |12
> > 12  |                  |
> > +---+---+          +---+-+
> > |       |          |node3|
> > | node1 |          |     |
> > +-------+          +-----+
> >
> > with node1-node2 as 22, node1-node3 as 24,node0-node3 as 22.
> >
> > I will get the below sched_domains_numa_distance[]:
> > 10, 12, 20, 22, 24
> >
> > What I have seen is the performance will be better if we
> > drop the 20 as we will get a sched_domain hierarchy with less
> > levels, and two intermediate nodes won't have the group span
> > issue.
> >
> 
> That is another thing that's worth considering. Morten was arguing that if
> the distance between two nodes is so tiny, it might not be worth
> representing it at all in the scheduler topology.

Yes. I agree it is a different thing. Anyway, I saw your patch has been
in sched tree. One side effect your patch is the one more sched_domain
level is imported for this topology:

                            24
                      X X XXX X X  X X X X XXX
             XX XX X                          XXXXX
         XXX                                        X
       XX                                             XXX
     XX                                 22              XXX
     X                           XXXXXXX                   XX
    X                        XXXXX      XXXXXXXXX           XXXX
   XX                      XXX                    XX X XX X    XX
+--------+           +---------+          +---------+      XX+---------+
| 0      |   12      | 1       | 20       | 2       |   12   |3        |
|        +-----------+         +----------+         +--------+         |
+---X----+           +---------+          +--X------+        +---------+
    X                                        X
    XX                                      X
     X                                     XX
      XX                                  XX
       XX                                X
        X XXX                         XXX
             X XXXXXX XX XX X X X XXXX
                       22
Without the patch, Linux will use 10,12,22,24 to build sched_domain;
With your patch, Linux will use 10,12,20,22,24 to build sched_domain.

So one more layer is added. What I have seen is that:

For node0 sched_domain <=12 and sched_domain <=20 span the same range
(node0, node1). So one of them is redundant. then in cpu_attach_domain,
the redundant one is dropped due to "remove the sched domains which
do not contribute to scheduling".

For node1&2, the origin code had no "20", thus built one less sched_domain
level.

What is really interesting is that removing 20 actually gives better
benchmark in speccpu :-)


> 
> > Thanks
> > Barry

Thanks
Barry

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ