[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <c7f231bf830b4c94adb6a34cc8a4b930@hisilicon.com>
Date: Wed, 3 Feb 2021 21:31:15 +0000
From: "Song Bao Hua (Barry Song)" <song.bao.hua@...ilicon.com>
To: Meelis Roos <mroos@...ux.ee>,
"valentin.schneider@....com" <valentin.schneider@....com>,
"vincent.guittot@...aro.org" <vincent.guittot@...aro.org>,
"mgorman@...e.de" <mgorman@...e.de>,
"mingo@...nel.org" <mingo@...nel.org>,
"peterz@...radead.org" <peterz@...radead.org>,
"dietmar.eggemann@....com" <dietmar.eggemann@....com>,
"morten.rasmussen@....com" <morten.rasmussen@....com>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
CC: "linuxarm@...neuler.org" <linuxarm@...neuler.org>,
"xuwei (O)" <xuwei5@...wei.com>,
"Liguozhu (Kenneth)" <liguozhu@...ilicon.com>,
"tiantao (H)" <tiantao6@...ilicon.com>,
wanghuiqiang <wanghuiqiang@...wei.com>,
"Zengtao (B)" <prime.zeng@...ilicon.com>,
Jonathan Cameron <jonathan.cameron@...wei.com>,
"guodong.xu@...aro.org" <guodong.xu@...aro.org>
Subject: RE: [PATCH v2] sched/topology: fix the issue groups don't span
domain->span for NUMA diameter > 2
> -----Original Message-----
> From: Meelis Roos [mailto:mroos@...ux.ee]
> Sent: Thursday, February 4, 2021 12:58 AM
> To: Song Bao Hua (Barry Song) <song.bao.hua@...ilicon.com>;
> valentin.schneider@....com; vincent.guittot@...aro.org; mgorman@...e.de;
> mingo@...nel.org; peterz@...radead.org; dietmar.eggemann@....com;
> morten.rasmussen@....com; linux-kernel@...r.kernel.org
> Cc: linuxarm@...neuler.org; xuwei (O) <xuwei5@...wei.com>; Liguozhu (Kenneth)
> <liguozhu@...ilicon.com>; tiantao (H) <tiantao6@...ilicon.com>; wanghuiqiang
> <wanghuiqiang@...wei.com>; Zengtao (B) <prime.zeng@...ilicon.com>; Jonathan
> Cameron <jonathan.cameron@...wei.com>; guodong.xu@...aro.org
> Subject: Re: [PATCH v2] sched/topology: fix the issue groups don't span
> domain->span for NUMA diameter > 2
>
> 03.02.21 13:12 Barry Song wrote:
> > kernel/sched/topology.c | 85 +++++++++++++++++++++++++----------------
> > 1 file changed, 53 insertions(+), 32 deletions(-)
> >
> > diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c
> > index 5d3675c7a76b..964ed89001fe 100644
> > --- a/kernel/sched/topology.c
> > +++ b/kernel/sched/topology.c
>
> This one still works on the Sun X4600-M2, on top of v5.11-rc6-55-g3aaf0a27ffc2.
>
>
> Performance-wise - is the some simple benhmark to run to meaure the impact?
> Compared to what - 5.10.0 or the kernel with the warning?
Hi Meelis,
Thanks for retesting.
Comparing to the kernel with the warning is enough. As I mentioned here:
https://lore.kernel.org/lkml/20210115203632.34396-1-song.bao.hua@hisilicon.com/
I have seen two major issues the broken sched_group has:
* in load_balance() and find_busiest_group()
kernel is calculating the avg_load and group_type by:
sum(load of cpus within sched_domain)
------------------------------------
capacity of the whole sched_group
since sched_group isn't a subset of sched_domain, so the load of
the problematic group is severely underestimated.
sched_domain
+----------------------------------+
| |
| +-------------------------------------------+
| | +-------+ +------+ | |
| | | cpu0 | | cpu1 | | |
| | +-------+ +------+ | |
+----------------------------------+ |
| |
| +-------+ +-------+ |
| |cpu2 | |cpu3 | |
| +-------+ +-------+ |
| |
+-------------------------------------------+
problematic sched_group
For the above example, kernel will divide "the sum load of
cpu0 and cpu1" by "the capacity of the whole group including
cpu0,1,2 and 3".
* in select_task_rq_fair() and find_idlest_group()
Kernel could push a forked/exec-ed task to the outside of the
sched_domain, but still inside the sched_group. For the above
diagram, while kernel wants to find the idlest cpu in the
sched_domain, it can result in picking cpu2 or cpu3.
I guess these two issues can potentially affect many benchmarks.
Our team have seen 5% unixbench score increase with the fix in
some machines though the real impact might be case-by-case.
>
> drop caches and time the build time of linux kernel with make -j64?
>
> --
> Meelis Roos
Thanks
Barry
Powered by blists - more mailing lists