linux-kernel - Re: [PATCH v2 1/3] sched/fair: fix rounding issue for asym packing

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <f2619d92-7367-abd4-97e8-e0f4e7cc96eb@arm.com>
Date:   Tue, 18 Dec 2018 17:32:06 +0000
From:   Valentin Schneider <valentin.schneider@....com>
To:     Vincent Guittot <vincent.guittot@...aro.org>, peterz@...radead.org,
        mingo@...nel.org, linux-kernel@...r.kernel.org
Cc:     Morten.Rasmussen@....com
Subject: Re: [PATCH v2 1/3] sched/fair: fix rounding issue for asym packing

On 14/12/2018 16:01, Vincent Guittot wrote:
> When check_asym_packing() is triggered, the imbalance is set to :
> busiest_stat.avg_load * busiest_stat.group_capacity / SCHED_CAPACITY_SCALE
> busiest_stat.avg_load also comes from a division and the final rounding
> can make imbalance slightly lower than the weighted load of the cfs_rq.
> But this is enough to skip the rq in find_busiest_queue and prevents asym
> migration to happen.
> 
> Add 1 to the avg_load to make sure that the targeted cpu will not be
> skipped unexpectidly.
> 
> Signed-off-by: Vincent Guittot <vincent.guittot@...aro.org>
> ---
>  kernel/sched/fair.c | 6 ++++++
>  1 file changed, 6 insertions(+)
> 
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index ca46964..c215f7a 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -8215,6 +8215,12 @@ static inline void update_sg_lb_stats(struct lb_env *env,
>  	/* Adjust by relative CPU capacity of the group */
>  	sgs->group_capacity = group->sgc->capacity;
>  	sgs->avg_load = (sgs->group_load*SCHED_CAPACITY_SCALE) / sgs->group_capacity;
> +	/*
> +	 * Prevent division rounding to make the computation of imbalance
> +	 * slightly less than original value and to prevent the rq to be then
> +	 * selected as busiest queue
> +	 */
> +	sgs->avg_load += 1;

I've tried investigating this off-by-one issue by running lmbench, but it
seems to be a gnarly one.

The adventure starts in update_sg_lb_stats() where we compute
sgs->avg_load:

	sgs->avg_load = (sgs->group_load*SCHED_CAPACITY_SCALE) / sgs->group_capacity

Then we drop by find_busiest_group() and call check_asym_packing() which,
if we do need to pack, does:

	env->imbalance = DIV_ROUND_CLOSEST(
		sds->busiest_stat.avg_load * sds->busiest_stat.group_capacity,
		SCHED_CAPACITY_SCALE);

And finally the one check that seems to be failing, and that you're
addressing with a +1 is in find_busiest_queue():

	if (rq->nr_running == 1 && wl > env->imbalance &&
	    !check_cpu_capacity(rq, env->sd))
		continue;

Now, running lmbench on my HiKey960 hacked up to use asym packing, I
get an example where there's a task that should be migrated via asym
packing but isn't:

# This a DIE level balance
update_sg_lb_stats(): 
	cpu=0 load=1
	cpu=1 load=1023
	cpu=2 load=0
	cpu=3 load=0

	avg_load=568

# Busiest group is [0-3]
find_busiest_group(): check_asym_packing():
	env->imbalance = 1022

find_busiest_queue():
	cpu=0 wl=1
	cpu=1 wl=1023
	cpu=2 wl=0
	cpu=3 wl=0

Only CPU1 has rq->nr_running > 0 (==1 actually), but wl > env->imbalance
so we skip it in find_busiest_queue().

Now, I thought it was due to IRQ/rt pressure - 1840 isn't the maximum
group capacity for the LITTLE cluster, it should be (463 * 4) == 1852.
I got curious and threw random group capacity values in that equation while
keeping the same avg_load [1], and it gets wild: you have a signal that
oscillates between 1024 and 1014!

[1]: https://gist.github.com/valschneider/f42252e83be943d276b901f57734b8ba

Adding +1 to avg_load doesn't solve those pesky rounding errors that get
worse as group_capacity grows, but it reverses the trend: the signal
oscillates above 1024.

So in the end a +1 *could* "fix" this, but
a) It'd make more sense to have it in the check_asym_packing() code
b) It's ugly and it makes me feel like this piece of code is flimsy AF.

>  
>  	if (sgs->sum_nr_running)
>  		sgs->load_per_task = sgs->sum_weighted_load / sgs->sum_nr_running;
>