linux-kernel - Re: [PATCH] sched/fair: optimize should_we

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <925bbda25461035fdec1bebf8487f84f9a3852a7.camel@linux.intel.com>
Date:   Tue, 05 Sep 2023 12:30:51 -0700
From:   Tim Chen <tim.c.chen@...ux.intel.com>
To:     Shrikanth Hegde <sshegde@...ux.vnet.ibm.com>, mingo@...hat.com,
        peterz@...radead.org, vincent.guittot@...aro.org
Cc:     dietmar.eggemann@....com, vschneid@...hat.com,
        linux-kernel@...r.kernel.org, srikar@...ux.vnet.ibm.com,
        mgorman@...hsingularity.net, mingo@...nel.org, yu.c.chen@...el.com,
        ricardo.neri-calderon@...ux.intel.com, iamjoonsoo.kim@....com,
        juri.lelli@...hat.com, rocking@...ux.alibaba.com,
        joshdon@...gle.com
Subject: Re: [PATCH] sched/fair: optimize should_we_balance for higher SMT
 systems

On Sat, 2023-09-02 at 13:42 +0530, Shrikanth Hegde wrote:
> 
> 
> Fixes: b1bfeab9b002 ("sched/fair: Consider the idle state of the whole core for load balance")
> Signed-off-by: Shrikanth Hegde <sshegde@...ux.vnet.ibm.com>
> ---
>  kernel/sched/fair.c | 15 ++++++++++++++-
>  1 file changed, 14 insertions(+), 1 deletion(-)
> 
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 0b7445cd5af9..6e31923293bb 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -6619,6 +6619,7 @@ static void dequeue_task_fair(struct rq *rq, struct task_struct *p, int flags)
>  /* Working cpumask for: load_balance, load_balance_newidle. */
>  static DEFINE_PER_CPU(cpumask_var_t, load_balance_mask);
>  static DEFINE_PER_CPU(cpumask_var_t, select_rq_mask);
> +static DEFINE_PER_CPU(cpumask_var_t, should_we_balance_tmpmask);
> 
>  #ifdef CONFIG_NO_HZ_COMMON
> 
> @@ -10913,6 +10914,7 @@ static int active_load_balance_cpu_stop(void *data);
> 
>  static int should_we_balance(struct lb_env *env)
>  {
> +	struct cpumask *swb_cpus = this_cpu_cpumask_var_ptr(should_we_balance_tmpmask);
>  	struct sched_group *sg = env->sd->groups;
>  	int cpu, idle_smt = -1;
> 
> @@ -10936,8 +10938,9 @@ static int should_we_balance(struct lb_env *env)
>  		return 1;
>  	}
> 
> +	cpumask_copy(swb_cpus, group_balance_mask(sg));
>  	/* Try to find first idle CPU */
> -	for_each_cpu_and(cpu, group_balance_mask(sg), env->cpus) {
> +	for_each_cpu_and(cpu, swb_cpus, env->cpus) {
>  		if (!idle_cpu(cpu))
>  			continue;
> 
> @@ -10949,6 +10952,14 @@ static int should_we_balance(struct lb_env *env)
>  		if (!(env->sd->flags & SD_SHARE_CPUCAPACITY) && !is_core_idle(cpu)) {
>  			if (idle_smt == -1)
>  				idle_smt = cpu;
> +			/*
> +			 * If the core is not idle, and first SMT sibling which is
> +			 * idle has been found, then its not needed to check other
> +			 * SMT siblings for idleness
> +			 */
> +#ifdef CONFIG_SCHED_SMT
> +			cpumask_andnot(swb_cpus, swb_cpus, cpu_smt_mask(cpu));
> +#endif
>  			continue;
>  		}
> 
> @@ -12914,6 +12925,8 @@ __init void init_sched_fair_class(void)
>  	for_each_possible_cpu(i) {
>  		zalloc_cpumask_var_node(&per_cpu(load_balance_mask, i), GFP_KERNEL, cpu_to_node(i));
>  		zalloc_cpumask_var_node(&per_cpu(select_rq_mask,    i), GFP_KERNEL, cpu_to_node(i));
> +		zalloc_cpumask_var_node(&per_cpu(should_we_balance_tmpmask, i),
> +					GFP_KERNEL, cpu_to_node(i));

Shrianth,

Wonder if we can avoid allocating the 
should_we_balance_tmpmask for SMT2 case to save memory
for system with large number of cores.

The new mask and logic I think is only needed for more than 2 threads in a core.

Tim
> 
>  #ifdef CONFIG_CFS_BANDWIDTH
>  		INIT_CSD(&cpu_rq(i)->cfsb_csd, __cfsb_csd_unthrottle, cpu_rq(i));
> --
> 2.31.1
>