lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Wed, 3 Feb 2021 21:31:15 +0000
From:   "Song Bao Hua (Barry Song)" <song.bao.hua@...ilicon.com>
To:     Meelis Roos <mroos@...ux.ee>,
        "valentin.schneider@....com" <valentin.schneider@....com>,
        "vincent.guittot@...aro.org" <vincent.guittot@...aro.org>,
        "mgorman@...e.de" <mgorman@...e.de>,
        "mingo@...nel.org" <mingo@...nel.org>,
        "peterz@...radead.org" <peterz@...radead.org>,
        "dietmar.eggemann@....com" <dietmar.eggemann@....com>,
        "morten.rasmussen@....com" <morten.rasmussen@....com>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
CC:     "linuxarm@...neuler.org" <linuxarm@...neuler.org>,
        "xuwei (O)" <xuwei5@...wei.com>,
        "Liguozhu (Kenneth)" <liguozhu@...ilicon.com>,
        "tiantao (H)" <tiantao6@...ilicon.com>,
        wanghuiqiang <wanghuiqiang@...wei.com>,
        "Zengtao (B)" <prime.zeng@...ilicon.com>,
        Jonathan Cameron <jonathan.cameron@...wei.com>,
        "guodong.xu@...aro.org" <guodong.xu@...aro.org>
Subject: RE: [PATCH v2] sched/topology: fix the issue groups don't span
 domain->span for NUMA diameter > 2



> -----Original Message-----
> From: Meelis Roos [mailto:mroos@...ux.ee]
> Sent: Thursday, February 4, 2021 12:58 AM
> To: Song Bao Hua (Barry Song) <song.bao.hua@...ilicon.com>;
> valentin.schneider@....com; vincent.guittot@...aro.org; mgorman@...e.de;
> mingo@...nel.org; peterz@...radead.org; dietmar.eggemann@....com;
> morten.rasmussen@....com; linux-kernel@...r.kernel.org
> Cc: linuxarm@...neuler.org; xuwei (O) <xuwei5@...wei.com>; Liguozhu (Kenneth)
> <liguozhu@...ilicon.com>; tiantao (H) <tiantao6@...ilicon.com>; wanghuiqiang
> <wanghuiqiang@...wei.com>; Zengtao (B) <prime.zeng@...ilicon.com>; Jonathan
> Cameron <jonathan.cameron@...wei.com>; guodong.xu@...aro.org
> Subject: Re: [PATCH v2] sched/topology: fix the issue groups don't span
> domain->span for NUMA diameter > 2
> 
> 03.02.21 13:12 Barry Song wrote:
> > kernel/sched/topology.c | 85 +++++++++++++++++++++++++----------------
> >   1 file changed, 53 insertions(+), 32 deletions(-)
> >
> > diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c
> > index 5d3675c7a76b..964ed89001fe 100644
> > --- a/kernel/sched/topology.c
> > +++ b/kernel/sched/topology.c
> 
> This one still works on the Sun X4600-M2, on top of v5.11-rc6-55-g3aaf0a27ffc2.
> 
> 
> Performance-wise - is the some simple benhmark to run to meaure the impact?
> Compared to what - 5.10.0 or the kernel with the warning?

Hi Meelis,
Thanks for retesting.

Comparing to the kernel with the warning is enough. As I mentioned here:
https://lore.kernel.org/lkml/20210115203632.34396-1-song.bao.hua@hisilicon.com/

I have seen two major issues the broken sched_group has:

* in load_balance() and find_busiest_group()
kernel is calculating the avg_load and group_type by:

sum(load of cpus within sched_domain)
------------------------------------
capacity of the whole sched_group

since sched_group isn't a subset of sched_domain, so the load of
the problematic group is severely underestimated.

sched_domain

  +----------------------------------+
  |                                  |
  |          +-------------------------------------------+
  |          | +-------+  +------+   |                   |
  |          | | cpu0  |  | cpu1 |   |                   |
  |          | +-------+  +------+   |                   |
  +----------------------------------+                   |
             |                                           |
             |      +-------+      +-------+             |
             |      |cpu2   |      |cpu3   |             |
             |      +-------+      +-------+             |
             |                                           |
             +-------------------------------------------+
                            problematic  sched_group


For the above example, kernel will divide "the sum load of
cpu0 and cpu1" by "the capacity of the whole group including
cpu0,1,2 and 3".

* in select_task_rq_fair() and find_idlest_group()
Kernel could push a forked/exec-ed task to the outside of the
sched_domain, but still inside the sched_group. For the above
diagram, while kernel wants to find the idlest cpu in the
sched_domain, it can result in picking cpu2 or cpu3.

I guess these two issues can potentially affect many benchmarks.
Our team have seen 5% unixbench score increase with the fix in
some machines though the real impact might be case-by-case.

> 
> drop caches and time the build time of linux kernel with make -j64?
> 
> --
> Meelis Roos

Thanks
Barry

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ