linux-kernel - Re: [RFC PATCH 4/7] sched/fair: Calculate the scan depth for idle balance based on system utilization

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <ZO9gp3ZVjIOuOJB9@chenyu5-mobl2>
Date:   Wed, 30 Aug 2023 23:30:47 +0800
From:   Chen Yu <yu.c.chen@...el.com>
To:     Shrikanth Hegde <sshegde@...ux.vnet.ibm.com>
CC:     Ingo Molnar <mingo@...hat.com>, Juri Lelli <juri.lelli@...hat.com>,
        "Tim Chen" <tim.c.chen@...el.com>,
        Mel Gorman <mgorman@...hsingularity.net>,
        Dietmar Eggemann <dietmar.eggemann@....com>,
        K Prateek Nayak <kprateek.nayak@....com>,
        "Gautham R . Shenoy" <gautham.shenoy@....com>,
        Chen Yu <yu.chen.surf@...il.com>,
        Aaron Lu <aaron.lu@...el.com>, <linux-kernel@...r.kernel.org>,
        Peter Zijlstra <peterz@...radead.org>,
        Vincent Guittot <vincent.guittot@...aro.org>
Subject: Re: [RFC PATCH 4/7] sched/fair: Calculate the scan depth for idle
 balance based on system utilization

On 2023-08-25 at 11:32:01 +0530, Shrikanth Hegde wrote:
> 
> 
> On 7/27/23 8:05 PM, Chen Yu wrote:
> > When the CPU is about to enter idle, it invokes newidle_balance()
> > to pull some tasks from other runqueues. Although there is per
> > domain max_newidle_lb_cost to throttle the newidle_balance(), it
> > would be good to further limit the scan based on overall system
> > utilization. The reason is that there is no limitation for
> > newidle_balance() to launch this balance simultaneously on
> > multiple CPUs. Since each newidle_balance() has to traverse all
> > the groups to calculate the statistics one by one, this total
> > time cost on newidle_balance() could be O(n^2). n is the number
> > of groups. This issue is more severe if there are many groups
> > within 1 domain, for example, a system with a large number of
> > Cores in a LLC domain. This is not good for performance or
> > power saving.
> > 
> > sqlite has spent quite some time on newidle balance() on Intel
> > Sapphire Rapids, which has 2 x 56C/112T = 224 CPUs:
> > 6.69%    0.09%  sqlite3     [kernel.kallsyms]   [k] newidle_balance
> > 5.39%    4.71%  sqlite3     [kernel.kallsyms]   [k] update_sd_lb_stats
> > 
> > Based on this observation, limit the scan depth of newidle_balance()
> > by considering the utilization of the sched domain. Let the number of
> > scanned groups be a linear function of the utilization ratio:
> > 
> > nr_groups_to_scan = nr_groups * (1 - util_ratio)
> > 
> > Suggested-by: Tim Chen <tim.c.chen@...el.com>
> > Signed-off-by: Chen Yu <yu.c.chen@...el.com>
> > ---
> >  include/linux/sched/topology.h |  1 +
> >  kernel/sched/fair.c            | 30 ++++++++++++++++++++++++++++++
> >  kernel/sched/features.h        |  1 +
> >  3 files changed, 32 insertions(+)
> > 
> > diff --git a/include/linux/sched/topology.h b/include/linux/sched/topology.h
> > index d6a64a2c92aa..af2261308529 100644
> > --- a/include/linux/sched/topology.h
> > +++ b/include/linux/sched/topology.h
> > @@ -84,6 +84,7 @@ struct sched_domain_shared {
> >  	int		nr_idle_scan;
> >  	unsigned long	total_load;
> >  	unsigned long	total_capacity;
> > +	int		nr_sg_scan;
> >  };
> >  
> >  struct sched_domain {
> > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> > index edcfee9965cd..6925813db59b 100644
> > --- a/kernel/sched/fair.c
> > +++ b/kernel/sched/fair.c
> > @@ -10153,6 +10153,35 @@ static void ilb_save_stats(struct lb_env *env,
> >  		WRITE_ONCE(sd_share->total_capacity, sds->total_capacity);
> >  }
> >  
> > +static void update_ilb_group_scan(struct lb_env *env,
> > +				  unsigned long sum_util,
> > +				  struct sched_domain_shared *sd_share)
> > +{
> > +	u64 tmp, nr_scan;
> > +
> > +	if (!sched_feat(ILB_UTIL))
> > +		return;
> > +
> > +	if (!sd_share)
> > +		return;
> > +
> > +	if (env->idle == CPU_NEWLY_IDLE)
> > +		return;
> 
> 
> Suggestion for small improvement:
> 
> First if condition here could be check for newidle. As it often very often we could save a few cycles of checking
> sched feature.
>

Yes, this makes sense, I'll change it.

thanks,
Chenyu