lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Wed, 19 Dec 2018 11:58:33 +0000
From:   Valentin Schneider <valentin.schneider@....com>
To:     Vincent Guittot <vincent.guittot@...aro.org>
Cc:     Peter Zijlstra <peterz@...radead.org>,
        Ingo Molnar <mingo@...nel.org>,
        linux-kernel <linux-kernel@...r.kernel.org>,
        Morten Rasmussen <Morten.Rasmussen@....com>
Subject: Re: [PATCH v2 1/3] sched/fair: fix rounding issue for asym packing

On 19/12/2018 08:32, Vincent Guittot wrote:
[...]
> This is another UC, asym packing is used at SMT level for now and we
> don't face this kind of problem, it has been also tested and DynamiQ
> configuration which is similar to SMT : 1 CPU per sched_group
> The legacy b.L one was not the main target although it can be
> 
> The rounding error is there and should be fixed and your UC is another
> case for legacy b.L that can be treated in another patch
> 

I used that setup out of convenience for myself, but AFAICT that use-case
just stresses that issue.

In the end we still have an imbalance value that can vary with only
slight group_capacity changes. And that's true even if that group
contains a single CPU, as the imbalance value computed by
check_asym_packing() can vary by +/-1 with very tiny capacity changes 
(rt/IRQ pressure). That means that sometimes you're off by one and don't
do the packing because some CPU was pressured, but some other time you
might do it because that CPU was *more* pressured. It's doesn't make sense. 


In the end problem is that in update_sg_lb_stats() we do:

	sgs->avg_load = (sgs->group_load*SCHED_CAPACITY_SCALE) / sgs->group_capacity;

and in check_asym_packing() we do:

	env->imbalance = DIV_ROUND_CLOSEST(
		sds->busiest_stat.avg_load * sds->busiest_stat.group_capacity,
		SCHED_CAPACITY_SCALE)

So we end up with something like:

		    group_load * SCHED_CAPACITY_SCALE * group_capacity
	imbalance = --------------------------------------------------
			    group_capacity * SCHED_CAPACITY_SCALE

Which you'd hope to reduce down to:

	imbalance = group_load

but you get division errors all over the place. So actually, the fix we'd
want would be:

-----8<-----
@@ -8463,9 +8463,7 @@ static int check_asym_packing(struct lb_env *env, struct sd_lb_stats *sds)
        if (sched_asym_prefer(busiest_cpu, env->dst_cpu))
                return 0;
 
-       env->imbalance = DIV_ROUND_CLOSEST(
-               sds->busiest_stat.avg_load * sds->busiest_stat.group_capacity,
-               SCHED_CAPACITY_SCALE);
+       env->imbalance = sds->busiest_stat.group_load;
 
        return 1;
 }
----->8-----

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ