lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Mon, 11 Jul 2022 17:03:04 +0100
From:   Qais Yousef <qais.yousef@....com>
To:     Vincent Guittot <vincent.guittot@...aro.org>
Cc:     mingo@...hat.com, peterz@...radead.org, juri.lelli@...hat.com,
        dietmar.eggemann@....com, rostedt@...dmis.org, bsegall@...gle.com,
        mgorman@...e.de, bristot@...hat.com, vschneid@...hat.com,
        linux-kernel@...r.kernel.org, david.chen@...anix.com,
        zhangqiao22@...wei.com
Subject: Re: [PATCH v2] sched/fair: fix case with reduced capacity CPU

Hi Vincent

On 07/08/22 17:44, Vincent Guittot wrote:
> The capacity of the CPU available for CFS tasks can be reduced because of
> other activities running on the latter. In such case, it's worth trying to
> move CFS tasks on a CPU with more available capacity.
> 
> The rework of the load balance has filtered the case when the CPU is
> classified to be fully busy but its capacity is reduced.
> 
> Check if CPU's capacity is reduced while gathering load balance statistic
> and classify it group_misfit_task instead of group_fully_busy so we can
> try to move the load on another CPU.
> 
> Reported-by: David Chen <david.chen@...anix.com>
> Reported-by: Zhang Qiao <zhangqiao22@...wei.com>
> Signed-off-by: Vincent Guittot <vincent.guittot@...aro.org>
> Tested-by: David Chen <david.chen@...anix.com>
> Tested-by: Zhang Qiao <zhangqiao22@...wei.com>
> ---

[...]

> @@ -8820,8 +8833,9 @@ static inline void update_sg_lb_stats(struct lb_env *env,
>  
>  	for_each_cpu_and(i, sched_group_span(group), env->cpus) {
>  		struct rq *rq = cpu_rq(i);
> +		unsigned long load = cpu_load(rq);
>  
> -		sgs->group_load += cpu_load(rq);
> +		sgs->group_load += load;
>  		sgs->group_util += cpu_util_cfs(i);
>  		sgs->group_runnable += cpu_runnable(rq);
>  		sgs->sum_h_nr_running += rq->cfs.h_nr_running;
> @@ -8851,11 +8865,17 @@ static inline void update_sg_lb_stats(struct lb_env *env,
>  		if (local_group)
>  			continue;
>  
> -		/* Check for a misfit task on the cpu */
> -		if (env->sd->flags & SD_ASYM_CPUCAPACITY &&
> -		    sgs->group_misfit_task_load < rq->misfit_task_load) {
> -			sgs->group_misfit_task_load = rq->misfit_task_load;
> -			*sg_status |= SG_OVERLOAD;
> +		if (env->sd->flags & SD_ASYM_CPUCAPACITY) {
> +			/* Check for a misfit task on the cpu */
> +			if (sgs->group_misfit_task_load < rq->misfit_task_load) {
> +				sgs->group_misfit_task_load = rq->misfit_task_load;
> +				*sg_status |= SG_OVERLOAD;
> +			}
> +		} else if ((env->idle != CPU_NOT_IDLE) &&
> +			   sched_reduced_capacity(rq, env->sd)) {
> +			/* Check for a task running on a CPU with reduced capacity */
> +			if (sgs->group_misfit_task_load < load)
> +				sgs->group_misfit_task_load = load;
>  		}
>  	}

Small questions mostly for my education purposes.

The new condition only applies for SMP systems. The reason asym systems don't
care is because misfit check already considers capacity pressure when checking
that the task fits_capacity()?

It **seems** to me that the migration margin in fits_capacity() acts like the
sd->imbalance_pct when check_cpu_capacity() is called by
sched_reduced_capacity(), did I get it right?

If I got it right, if the migration margin ever tweaked, could we potentially
start seeing this kind of reported issue on asym systems then? I guess not. It
just seems to me for asym systems tweaking the migration margin is similar to
tweaking imbalance_pct for smp ones. But the subtlety is greater as
imbalance_pct is still used in asym systems.


Thanks

--
Qais Yousef

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ