[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20220519181931.GA23577@chenyu5-mobl1>
Date: Fri, 20 May 2022 02:19:31 +0800
From: Chen Yu <yu.c.chen@...el.com>
To: K Prateek Nayak <kprateek.nayak@....com>
Cc: Peter Zijlstra <peterz@...radead.org>,
Vincent Guittot <vincent.guittot@...aro.org>,
Mel Gorman <mgorman@...e.de>,
Yicong Yang <yangyicong@...ilicon.com>,
Tim Chen <tim.c.chen@...el.com>,
Chen Yu <yu.chen.surf@...il.com>,
Ingo Molnar <mingo@...hat.com>,
Juri Lelli <juri.lelli@...hat.com>,
Dietmar Eggemann <dietmar.eggemann@....com>,
Steven Rostedt <rostedt@...dmis.org>,
Barry Song <21cnbao@...il.com>,
Srikar Dronamraju <srikar@...ux.vnet.ibm.com>,
Len Brown <len.brown@...el.com>,
Ben Segall <bsegall@...gle.com>,
Aubrey Li <aubrey.li@...el.com>,
Abel Wu <wuyun.abel@...edance.com>,
Zhang Rui <rui.zhang@...el.com>, linux-kernel@...r.kernel.org,
Daniel Bristot de Oliveira <bristot@...hat.com>
Subject: Re: [PATCH v3] sched/fair: Introduce SIS_UTIL to search idle CPU
based on sum of util_avg
On Mon, May 16, 2022 at 04:22:34PM +0530, K Prateek Nayak wrote:
[snip]
> I've ran the benchmark in two sets of 3 runs rebooting
> in between on each kernel version:
>
> - tip
>
> Test: tip-r0 tip-r1 tip-r2
> 1-groups: 4.64 (0.00 pct) 4.90 (-5.60 pct) 4.99 (-7.54 pct)
> 2-groups: 5.54 (0.00 pct) 5.56 (-0.36 pct) 5.58 (-0.72 pct)
> 4-groups: 6.24 (0.00 pct) 6.18 (0.96 pct) 6.20 (0.64 pct)
> 8-groups: 7.54 (0.00 pct) 7.50 (0.53 pct) 7.54 (0.00 pct)
> 16-groups: 10.85 (0.00 pct) 11.17 (-2.94 pct) 10.91 (-0.55 pct)
>
> Test: tip-r3 tip-r4 tip-r5
> 1-groups: 4.68 (0.00 pct) 4.97 (-6.19 pct) 4.98 (-6.41 pct)
> 2-groups: 5.60 (0.00 pct) 5.62 (-0.35 pct) 5.66 (-1.07 pct)
> 4-groups: 6.24 (0.00 pct) 6.23 (0.16 pct) 6.24 (0.00 pct)
> 8-groups: 7.54 (0.00 pct) 7.50 (0.53 pct) 7.46 (1.06 pct)
> 16-groups: 10.81 (0.00 pct) 10.84 (-0.27 pct) 10.81 (0.00 pct)
>
> - SIS_UTIL
>
>
> Test: SIS_UTIL-r0 SIS_UTIL-r1 SIS_UTIL-r2
> 1-groups: 4.68 (0.00 pct) 5.03 (-7.47 pct) 4.96 (-5.98 pct)
> 2-groups: 5.45 (0.00 pct) 5.48 (-0.55 pct) 5.50 (-0.91 pct)
> 4-groups: 6.10 (0.00 pct) 6.07 (0.49 pct) 6.14 (-0.65 pct)
> 8-groups: 7.52 (0.00 pct) 7.51 (0.13 pct) 7.52 (0.00 pct)
> 16-groups: 11.63 (0.00 pct) 11.48 (1.28 pct) 11.51 (1.03 pct)
>
> Test: SIS_UTIL-r3 SIS_UTIL-r4 SIS_UTIL-r5
> 1-groups: 4.80 (0.00 pct) 5.00 (-4.16 pct) 5.06 (-5.41 pct)
> 2-groups: 5.51 (0.00 pct) 5.58 (-1.27 pct) 5.58 (-1.27 pct)
> 4-groups: 6.14 (0.00 pct) 6.11 (0.48 pct) 6.06 (1.30 pct)
> 8-groups: 7.35 (0.00 pct) 7.38 (-0.40 pct) 7.40 (-0.68 pct)
> 16-groups: 11.03 (0.00 pct) 11.29 (-2.35 pct) 11.14 (-0.99 pct)
>
> - Comparing the best and bad data points for 16-groups with each
> kernel version:
>
> Test: tip-good SIS_UTIL-good
> 1-groups: 4.68 (0.00 pct) 4.80 (-2.56 pct)
> 2-groups: 5.60 (0.00 pct) 5.51 (1.60 pct)
> 4-groups: 6.24 (0.00 pct) 6.14 (1.60 pct)
> 8-groups: 7.54 (0.00 pct) 7.35 (2.51 pct)
> 16-groups: 10.81 (0.00 pct) 11.03 (-2.03 pct)
>
> Test: tip-good SIS_UTIL-bad
> 1-groups: 4.68 (0.00 pct) 4.68 (0.00 pct)
> 2-groups: 5.60 (0.00 pct) 5.45 (2.67 pct)
> 4-groups: 6.24 (0.00 pct) 6.10 (2.24 pct)
> 8-groups: 7.54 (0.00 pct) 7.52 (0.26 pct)
> 16-groups: 10.81 (0.00 pct) 11.63 (-7.58 pct)
>
> Test: tip-bad SIS_UTIL-good
> 1-groups: 4.90 (0.00 pct) 4.80 (2.04 pct)
> 2-groups: 5.56 (0.00 pct) 5.51 (0.89 pct)
> 4-groups: 6.18 (0.00 pct) 6.14 (0.64 pct)
> 8-groups: 7.50 (0.00 pct) 7.35 (2.00 pct)
> 16-groups: 11.17 (0.00 pct) 11.03 (1.25 pct)
>
> Test: tip-bad SIS_UTIL-bad
> 1-groups: 4.90 (0.00 pct) 4.68 (4.48 pct)
> 2-groups: 5.56 (0.00 pct) 5.45 (1.97 pct)
> 4-groups: 6.18 (0.00 pct) 6.10 (1.29 pct)
> 8-groups: 7.50 (0.00 pct) 7.52 (-0.26 pct)
> 16-groups: 11.17 (0.00 pct) 11.63 (-4.11 pct)
>
> Hackbench consistently reports > 11 for 16-group
> case with SIS_UTIL however only once with SIS_PROP
>
Mel has mentioned that, although it is 'overloaded', the nature of hackbench
would benefit from scanning for more CPUs because there is frequent context switch,
thus there is more chance to get idle.
> > I'm thinking of taking nr_llc number into consideration to adjust the search depth,
> > something like:
> > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> > index dd52fc5a034b..39b914599dce 100644
> > --- a/kernel/sched/fair.c
> > +++ b/kernel/sched/fair.c
> > @@ -9302,6 +9302,9 @@ static inline void update_idle_cpu_scan(struct lb_env *env,
> > llc_util_pct = (sum_util * 100) / (nr_llc * SCHED_CAPACITY_SCALE);
> > nr_scan = (100 - (llc_util_pct * llc_util_pct / 72)) * nr_llc / 100;
> > nr_scan = max(nr_scan, 0);
> > + if (nr_llc <= 16 && nr_scan)
> > + nr_scan = nr_llc;
> > +
> This will behave closer to the initial RFC on systems with smaller LLC.
> I can do some preliminary testing with this and get back to you.
> > WRITE_ONCE(sd_share->nr_idle_scan, nr_scan);
> > }
> >
> > I'll offline the CPUs to make it 16 CPUs per LLC, and check what hackbench behaves.
> Thank you for looking into this.
>
OK, I've done some tests and recorded the number of CPUs SIS_PROP and SIS_UTIL
want to scan(the nr).
1. 16 CPUs online, 16 groups of hackbench (total 64 threads)
Most of time SIS_PROP would scan for 4 CPUs, while SIS_UTIL scans for 2 CPUs.
2. 112 CPUs online, 16 groups of hackbench (total 448 threads)
Most of time SIS_PROP would scan for 4 CPUs, but there is a small part of
SIS_PROP would scan the entire LLC, which could be time costly.
The number of CPUs scanned by SIS_UTIL ranges in [2, 20] and it looks like a
Normal Distribution.
(I'll send you the plot picture offline so you can have a view of how these data
look like)
This means, for 16 CPUs case, the extrem overloaded hackbench would benefit from
scanning for more CPUs. But for 112 CPUs platform, using SIS_UTIL seems to be more
reasonable because it does not have 'jitters' to scan for the whole LLC.
Let me revise the patch according to Peter and Mel's suggestion, and send
a v4 (then cook the complementary patch to deal with system having 16
CPUs per LLC domain).
thanks,
Chenyu
> --
> Thanks and Regards,
> Prateek
>
Powered by blists - more mailing lists