linux-kernel - Re: [PATCH 3/7] sched/fair: Correct unit of load_above

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20160503105233.GN3430@twins.programming.kicks-ass.net>
Date:	Tue, 3 May 2016 12:52:33 +0200
From:	Peter Zijlstra <peterz@...radead.org>
To:	Dietmar Eggemann <dietmar.eggemann@....com>
Cc:	linux-kernel@...r.kernel.org,
	Morten Rasmussen <morten.rasmussen@....com>
Subject: Re: [PATCH 3/7] sched/fair: Correct unit of load_above_capacity

On Fri, Apr 29, 2016 at 08:32:40PM +0100, Dietmar Eggemann wrote:
> From: Morten Rasmussen <morten.rasmussen@....com>
> 
> In calculate_imbalance() load_above_capacity currently has the unit
> [load] while it is used as being [load/capacity]. Not only is it wrong it
> also makes it unlikely that load_above_capacity is ever used as the
> subsequent code picks the smaller of load_above_capacity and the avg_load
> difference between the local and the busiest sched_groups.
> 
> This patch ensures that load_above_capacity has the right unit
> [load/capacity].
> 
> Signed-off-by: Morten Rasmussen <morten.rasmussen@....com>
> ---
>  kernel/sched/fair.c | 6 ++++--
>  1 file changed, 4 insertions(+), 2 deletions(-)
> 
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index bc19fee94f39..c066574cff04 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -7016,9 +7016,11 @@ static inline void calculate_imbalance(struct lb_env *env, struct sd_lb_stats *s
>  	    local->group_type   == group_overloaded) {
>  		load_above_capacity = busiest->sum_nr_running *
>  					SCHED_LOAD_SCALE;

Given smpnice and cgroups; this starting point seems somewhat random.
Then again, without cgroup and assuming all tasks are nice-0 it is still
true. But I think the cgroup muck is rather universal these days.

The above SCHED_LOAD_SCALE is the 1 -> load metric, right?

> -		if (load_above_capacity > busiest->group_capacity)
> +		if (load_above_capacity > busiest->group_capacity) {
>  			load_above_capacity -= busiest->group_capacity;
> -		else
> +			load_above_capacity *= SCHED_LOAD_SCALE;
> +			load_above_capacity /= busiest->group_capacity;

And this one does load->capacity

> +		} else
>  			load_above_capacity = ~0UL;
>  	}


Is there anything better we can do? I know there's a fair bit of cruft
in; but these assumptions that SCHED_LOAD_SCALE is one task, just bug
the hell out of me.

Would it make sense to make load_balance()->detach_tasks() ensure
busiest->sum_nr_running >= busiest->group_weight or so?