[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <bdfb60a29867412b97d652d6e04760ef@hisilicon.com>
Date: Tue, 25 May 2021 08:14:45 +0000
From: "Song Bao Hua (Barry Song)" <song.bao.hua@...ilicon.com>
To: Dietmar Eggemann <dietmar.eggemann@....com>,
Vincent Guittot <vincent.guittot@...aro.org>
CC: "tim.c.chen@...ux.intel.com" <tim.c.chen@...ux.intel.com>,
"catalin.marinas@....com" <catalin.marinas@....com>,
"will@...nel.org" <will@...nel.org>,
"rjw@...ysocki.net" <rjw@...ysocki.net>,
"bp@...en8.de" <bp@...en8.de>,
"tglx@...utronix.de" <tglx@...utronix.de>,
"mingo@...hat.com" <mingo@...hat.com>,
"lenb@...nel.org" <lenb@...nel.org>,
"peterz@...radead.org" <peterz@...radead.org>,
"rostedt@...dmis.org" <rostedt@...dmis.org>,
"bsegall@...gle.com" <bsegall@...gle.com>,
"mgorman@...e.de" <mgorman@...e.de>,
"msys.mizuma@...il.com" <msys.mizuma@...il.com>,
"valentin.schneider@....com" <valentin.schneider@....com>,
"gregkh@...uxfoundation.org" <gregkh@...uxfoundation.org>,
Jonathan Cameron <jonathan.cameron@...wei.com>,
"juri.lelli@...hat.com" <juri.lelli@...hat.com>,
"mark.rutland@....com" <mark.rutland@....com>,
"sudeep.holla@....com" <sudeep.holla@....com>,
"aubrey.li@...ux.intel.com" <aubrey.li@...ux.intel.com>,
"linux-arm-kernel@...ts.infradead.org"
<linux-arm-kernel@...ts.infradead.org>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
"linux-acpi@...r.kernel.org" <linux-acpi@...r.kernel.org>,
"x86@...nel.org" <x86@...nel.org>, "xuwei (O)" <xuwei5@...wei.com>,
"Zengtao (B)" <prime.zeng@...ilicon.com>,
"guodong.xu@...aro.org" <guodong.xu@...aro.org>,
yangyicong <yangyicong@...wei.com>,
"Liguozhu (Kenneth)" <liguozhu@...ilicon.com>,
"linuxarm@...neuler.org" <linuxarm@...neuler.org>,
"hpa@...or.com" <hpa@...or.com>
Subject: RE: [RFC PATCH v6 3/4] scheduler: scan idle cpu in cluster for tasks
within one LLC
> -----Original Message-----
> From: Dietmar Eggemann [mailto:dietmar.eggemann@....com]
> Sent: Friday, May 14, 2021 12:32 AM
> To: Song Bao Hua (Barry Song) <song.bao.hua@...ilicon.com>; Vincent Guittot
> <vincent.guittot@...aro.org>
> Cc: tim.c.chen@...ux.intel.com; catalin.marinas@....com; will@...nel.org;
> rjw@...ysocki.net; bp@...en8.de; tglx@...utronix.de; mingo@...hat.com;
> lenb@...nel.org; peterz@...radead.org; rostedt@...dmis.org;
> bsegall@...gle.com; mgorman@...e.de; msys.mizuma@...il.com;
> valentin.schneider@....com; gregkh@...uxfoundation.org; Jonathan Cameron
> <jonathan.cameron@...wei.com>; juri.lelli@...hat.com; mark.rutland@....com;
> sudeep.holla@....com; aubrey.li@...ux.intel.com;
> linux-arm-kernel@...ts.infradead.org; linux-kernel@...r.kernel.org;
> linux-acpi@...r.kernel.org; x86@...nel.org; xuwei (O) <xuwei5@...wei.com>;
> Zengtao (B) <prime.zeng@...ilicon.com>; guodong.xu@...aro.org; yangyicong
> <yangyicong@...wei.com>; Liguozhu (Kenneth) <liguozhu@...ilicon.com>;
> linuxarm@...neuler.org; hpa@...or.com
> Subject: Re: [RFC PATCH v6 3/4] scheduler: scan idle cpu in cluster for tasks
> within one LLC
>
> On 07/05/2021 15:07, Song Bao Hua (Barry Song) wrote:
> >
> >
> >> -----Original Message-----
> >> From: Dietmar Eggemann [mailto:dietmar.eggemann@....com]
>
> [...]
>
> >> On 03/05/2021 13:35, Song Bao Hua (Barry Song) wrote:
> >>
> >> [...]
> >>
> >>>> From: Song Bao Hua (Barry Song)
> >>
> >> [...]
> >>
> >>>>> From: Dietmar Eggemann [mailto:dietmar.eggemann@....com]
> >>
> >> [...]
> >>
> >>>>> On 29/04/2021 00:41, Song Bao Hua (Barry Song) wrote:
> >>>>>>
> >>>>>>
> >>>>>>> -----Original Message-----
> >>>>>>> From: Dietmar Eggemann [mailto:dietmar.eggemann@....com]
> >>>>>
> >>>>> [...]
> >>>>>
> >>>>>>>>>> From: Dietmar Eggemann [mailto:dietmar.eggemann@....com]
> >>>>>>>
> >>>>>>> [...]
> >>>>>>>
> >>>>>>>>>> On 20/04/2021 02:18, Barry Song wrote:
> >>
> >> [...]
> >>
> >>>
> >>> On the other hand, according to "sched: Implement smarter wake-affine logic"
> >>>
> >>
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/
> >> ?id=62470419
> >>>
> >>> Proper factor in wake_wide is mainly beneficial of 1:n tasks like
> >> postgresql/pgbench.
> >>> So using the smaller cluster size as factor might help make wake_affine
> false
> >> so
> >>> improve pgbench.
> >>>
> >>> From the commit log, while clients = 2*cpus, the commit made the biggest
> >>> improvement. In my case, It should be clients=48 for a machine whose LLC
> >>> size is 24.
> >>>
> >>> In Linux, I created a 240MB database and ran "pgbench -c 48 -S -T 20 pgbench"
> >>> under two different scenarios:
> >>> 1. page cache always hit, so no real I/O for database read
> >>> 2. echo 3 > /proc/sys/vm/drop_caches
> >>>
> >>> For case 1, using cluster_size and using llc_size will result in similar
> >>> tps= ~108000, all of 24 cpus have 100% cpu utilization.
> >>>
> >>> For case 2, using llc_size still shows better performance.
> >>>
> >>> tps for each test round(cluster size as factor in wake_wide):
> >>> 1398.450887 1275.020401 1632.542437 1412.241627 1611.095692 1381.354294
> >> 1539.877146
> >>> avg tps = 1464
> >>>
> >>> tps for each test round(llc size as factor in wake_wide):
> >>> 1718.402983 1443.169823 1502.353823 1607.415861 1597.396924 1745.651814
> >> 1876.802168
> >>> avg tps = 1641 (+12%)
> >>>
> >>> so it seems using cluster_size as factor in "slave >= factor && master >=
> >> slave *
> >>> factor" isn't a good choice for my machine at least.
> >>
> >> So SD size = 4 (instead of 24) seems to be too small for `-c 48`.
> >>
> >> Just curious, have you seen the benefit of using wake wide on SD size =
> >> 24 (LLC) compared to not using it at all?
> >
> > At least in my benchmark made today, I have not seen any benefit to use
> > llc_size. Always returning 0 in wake_wide() seems to be much better.
> >
> > postgres@...ntu:$pgbench -i pgbench
> > postgres@...ench:$ pgbench -T 120 -c 48 pgbench
> >
> > using llc_size, it got to 123tps
> > always returning 0 in wake_wide(), it got to 158tps
> >
> > actually, I really couldn't reproduce the performance improvement
> > the commit "sched: Implement smarter wake-affine logic" mentioned.
> > on the other hand, the commit log didn't present the pgbench command
> > parameter used. I guess the benchmark result will highly depend on
> > the command parameter and disk I/O speed.
>
> I see. And it was a way smaller machine (12 CPUs) back then.
>
> You could run pgbench via mmtests https://github.com/gormanm/mmtests.
>
> I.e the `timed-ro-medium` test.
>
> mmtests# ./run-mmtests.sh --config
> ./configs/config-db-pgbench-timed-ro-medium test_tag
>
> /shellpacks/shellpack-bench-pgbench contains all the individual test
> steps. Something you could use as a template for your pgbench standalone
> tests as well.
>
> I ran this test on an Intel Xeon E5-2690 v2 with 40 CPUs and 64GB of
> memory on v5.12 vanilla and w/o wakewide.
> The test uses `scale_factor = 2570` on this machine. I guess this
> relates to ~41GB? At least this was the size of the:
Thanks. Dietmar, sorry for slow response. Sick leave for the whole
last week.
I feel it makes much more sense to use mmtests which is setting
scale_factor according to total memory size, thus, considering
the impact of page cache. And it is also doing database warming-up
for 30minutes.
I will get more data and compare three cases:
1. use cluster as wake_wide factor
2. use llc as wake_wide factor
3. always return 0 in wake_wide.
and post the result afterwards.
>
> #mmtests/work/testdisk/data/pgdata directory when the test started.
>
>
> mmtests/work/log# ../../compare-kernels.sh --baseline base --compare
> wo_wakewide | grep ^Hmean
>
>
> #clients v5.12 vanilla v5.12 w/o wakewide
>
> Hmean 1 10903.88 ( 0.00%) 10792.59 * -1.02%*
> Hmean 6 28480.60 ( 0.00%) 27954.97 * -1.85%*
> Hmean 12 49197.55 ( 0.00%) 47758.16 * -2.93%*
> Hmean 22 72902.37 ( 0.00%) 71314.01 * -2.18%*
> Hmean 30 75468.16 ( 0.00%) 75929.17 * 0.61%*
> Hmean 48 60155.58 ( 0.00%) 60471.91 * 0.53%*
> Hmean 80 62202.38 ( 0.00%) 60814.76 * -2.23%*
>
>
> So there are some improvements w/ wakewide but nothing of the scale
> showed in the original wakewide patch.
>
> I'm not an expert on how to set up these pgbench tests though. So maybe
> other pgbench related mmtests configs or some more fine-grained tuning
> can produce bigger diffs?
Thanks
Barry
Powered by blists - more mailing lists