[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <b6bfb636b1404d3c827e2ba2034e6822@hisilicon.com>
Date: Mon, 7 Dec 2020 09:59:21 +0000
From: "Song Bao Hua (Barry Song)" <song.bao.hua@...ilicon.com>
To: Vincent Guittot <vincent.guittot@...aro.org>
CC: Valentin Schneider <valentin.schneider@....com>,
Catalin Marinas <catalin.marinas@....com>,
Will Deacon <will@...nel.org>,
"Rafael J. Wysocki" <rjw@...ysocki.net>,
"Cc: Len Brown" <lenb@...nel.org>,
"gregkh@...uxfoundation.org" <gregkh@...uxfoundation.org>,
Jonathan Cameron <jonathan.cameron@...wei.com>,
Ingo Molnar <mingo@...hat.com>,
Peter Zijlstra <peterz@...radead.org>,
Juri Lelli <juri.lelli@...hat.com>,
Dietmar Eggemann <dietmar.eggemann@....com>,
Steven Rostedt <rostedt@...dmis.org>,
Ben Segall <bsegall@...gle.com>, Mel Gorman <mgorman@...e.de>,
Mark Rutland <mark.rutland@....com>,
LAK <linux-arm-kernel@...ts.infradead.org>,
linux-kernel <linux-kernel@...r.kernel.org>,
ACPI Devel Maling List <linux-acpi@...r.kernel.org>,
Linuxarm <linuxarm@...wei.com>, "xuwei (O)" <xuwei5@...wei.com>,
"Zengtao (B)" <prime.zeng@...ilicon.com>
Subject: RE: [RFC PATCH v2 2/2] scheduler: add scheduler level for clusters
> -----Original Message-----
> From: Vincent Guittot [mailto:vincent.guittot@...aro.org]
> Sent: Thursday, December 3, 2020 10:39 PM
> To: Song Bao Hua (Barry Song) <song.bao.hua@...ilicon.com>
> Cc: Valentin Schneider <valentin.schneider@....com>; Catalin Marinas
> <catalin.marinas@....com>; Will Deacon <will@...nel.org>; Rafael J. Wysocki
> <rjw@...ysocki.net>; Cc: Len Brown <lenb@...nel.org>;
> gregkh@...uxfoundation.org; Jonathan Cameron <jonathan.cameron@...wei.com>;
> Ingo Molnar <mingo@...hat.com>; Peter Zijlstra <peterz@...radead.org>; Juri
> Lelli <juri.lelli@...hat.com>; Dietmar Eggemann <dietmar.eggemann@....com>;
> Steven Rostedt <rostedt@...dmis.org>; Ben Segall <bsegall@...gle.com>; Mel
> Gorman <mgorman@...e.de>; Mark Rutland <mark.rutland@....com>; LAK
> <linux-arm-kernel@...ts.infradead.org>; linux-kernel
> <linux-kernel@...r.kernel.org>; ACPI Devel Maling List
> <linux-acpi@...r.kernel.org>; Linuxarm <linuxarm@...wei.com>; xuwei (O)
> <xuwei5@...wei.com>; Zengtao (B) <prime.zeng@...ilicon.com>
> Subject: Re: [RFC PATCH v2 2/2] scheduler: add scheduler level for clusters
>
> On Thu, 3 Dec 2020 at 10:11, Song Bao Hua (Barry Song)
> <song.bao.hua@...ilicon.com> wrote:
> >
> >
> >
> > > -----Original Message-----
> > > From: Vincent Guittot [mailto:vincent.guittot@...aro.org]
> > > Sent: Thursday, December 3, 2020 10:04 PM
> > > To: Song Bao Hua (Barry Song) <song.bao.hua@...ilicon.com>
> > > Cc: Valentin Schneider <valentin.schneider@....com>; Catalin Marinas
> > > <catalin.marinas@....com>; Will Deacon <will@...nel.org>; Rafael J. Wysocki
> > > <rjw@...ysocki.net>; Cc: Len Brown <lenb@...nel.org>;
> > > gregkh@...uxfoundation.org; Jonathan Cameron
> <jonathan.cameron@...wei.com>;
> > > Ingo Molnar <mingo@...hat.com>; Peter Zijlstra <peterz@...radead.org>; Juri
> > > Lelli <juri.lelli@...hat.com>; Dietmar Eggemann
> <dietmar.eggemann@....com>;
> > > Steven Rostedt <rostedt@...dmis.org>; Ben Segall <bsegall@...gle.com>; Mel
> > > Gorman <mgorman@...e.de>; Mark Rutland <mark.rutland@....com>; LAK
> > > <linux-arm-kernel@...ts.infradead.org>; linux-kernel
> > > <linux-kernel@...r.kernel.org>; ACPI Devel Maling List
> > > <linux-acpi@...r.kernel.org>; Linuxarm <linuxarm@...wei.com>; xuwei (O)
> > > <xuwei5@...wei.com>; Zengtao (B) <prime.zeng@...ilicon.com>
> > > Subject: Re: [RFC PATCH v2 2/2] scheduler: add scheduler level for clusters
> > >
> > > On Wed, 2 Dec 2020 at 21:58, Song Bao Hua (Barry Song)
> > > <song.bao.hua@...ilicon.com> wrote:
> > > >
> > > > >
> > > > > Sorry. Please ignore this. I added some printk here while testing
> > > > > one numa. Will update you the data in another email.
> > > >
> > > > Re-tested in one NUMA node(cpu0-cpu23):
> > > >
> > > > g=1
> > > > Running in threaded mode with 1 groups using 40 file descriptors
> > > > Each sender will pass 100000 messages of 100 bytes
> > > > w/o: 7.689 7.485 7.485 7.458 7.524 7.539 7.738 7.693 7.568 7.674=7.5853
> > > > w/ : 7.516 7.941 7.374 7.963 7.881 7.910 7.420 7.556 7.695 7.441=7.6697
> > > > w/ but dropped select_idle_cluster:
> > > > 7.752 7.739 7.739 7.571 7.545 7.685 7.407 7.580 7.605 7.487=7.611
> > > >
> > > > g=2
> > > > Running in threaded mode with 2 groups using 40 file descriptors
> > > > Each sender will pass 100000 messages of 100 bytes
> > > > w/o: 10.127 10.119 10.070 10.196 10.057 10.111 10.045 10.164 10.162
> > > > 9.955=10.1006
> > > > w/ : 9.694 9.654 9.612 9.649 9.686 9.734 9.607 9.842 9.690 9.710=9.6878
> > > > w/ but dropped select_idle_cluster:
> > > > 9.877 10.069 9.951 9.918 9.947 9.790 9.906 9.820 9.863 9.906=9.9047
> > > >
> > > > g=3
> > > > Running in threaded mode with 3 groups using 40 file descriptors
> > > > Each sender will pass 100000 messages of 100 bytes
> > > > w/o: 15.885 15.254 15.932 15.647 16.120 15.878 15.857 15.759 15.674
> > > > 15.721=15.7727
> > > > w/ : 14.974 14.657 13.969 14.985 14.728 15.665 15.191 14.995 14.946
> > > > 14.895=14.9005
> > > > w/ but dropped select_idle_cluster:
> > > > 15.405 15.177 15.373 15.187 15.450 15.540 15.278 15.628 15.228
> > > 15.325=15.3591
> > > >
> > > > g=4
> > > > Running in threaded mode with 4 groups using 40 file descriptors
> > > > Each sender will pass 100000 messages of 100 bytes
> > > > w/o: 20.014 21.025 21.119 21.235 19.767 20.971 20.962 20.914 21.090
> > > 21.090=20.8187
> > > > w/ : 20.331 20.608 20.338 20.445 20.456 20.146 20.693 20.797 21.381
> > > 20.452=20.5647
> > > > w/ but dropped select_idle_cluster:
> > > > 19.814 20.126 20.229 20.350 20.750 20.404 19.957 19.888 20.226
> > > 20.562=20.2306
> > > >
> > >
> > > I assume that you have run this on v5.9 as previous tests.
> >
> > Yep
> >
> > > The results don't show any real benefit of select_idle_cluster()
> > > inside a node whereas this is where we could expect most of the
> > > benefit. We have to understand why we have such an impact on numa
> > > tests only.
> >
> > There is a 4-5.5% increase while g=2 and g=3.
>
> my point was with vs without select_idle_cluster() but still having a
> cluster domain level
> In this case, the diff is -0.8% for g=1 +2.2% for g=2, +3% for g=3 and
> -1.7% for g=4
>
> >
> > Regarding the huge increase in NUMA case, at the first beginning, I suspect
> > we have wrong llc domain. For example, if cpu0's llc domain span
> > cpu0-cpu47, then select_idle_cpu() is running in wrong range while
> > it should run in cpu0-cpu23.
> >
> > But after printing the llc domain's span, I find it is completely right.
> > Cpu0's llc span: cpu0-cpu23
> > Cpu24's llc span: cpu24-cpu47
>
> Have you checked that the cluster mask was also correct ?
>
> >
> > Maybe I need more trace data to figure out if select_idle_cpu() is running
> > correctly. For example, maybe I can figure out if it is always returning -1,
> > or it returns -1 very often?
>
> yes, could be interesting to check how often select_idle_cpu return -1
>
> >
> > Or do you have any idea?
>
> tracking migration across nod could help to understand too
I set a bootargs mem=4G to do swapping test before working on cluster
scheduler issue. but I forgot to remove the parameter.
The huge increase on across-numa case can only be reproduced while
i use this mem=4G cmdline which means numa1 has no memory.
After removing the limitation, I can't reproduce the huge increase
for two NUMAs any more.
Guess select_idle_cluster() somehow workaround an scheduler issue
for numa without memory.
>
> Vincent
> >
> >
Thanks
Barry
Powered by blists - more mailing lists