lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <83A9BEDF-20BB-4BAD-AABD-0EECB92BF8DF@fb.com>
Date:   Mon, 26 Oct 2020 11:05:35 -0400
From:   "Chris Mason" <clm@...com>
To:     Vincent Guittot <vincent.guittot@...aro.org>
CC:     Peter Zijlstra <peterz@...radead.org>,
        Johannes Weiner <hannes@...xchg.org>,
        Rik van Riel <riel@...riel.com>,
        linux-kernel <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH] fix scheduler regression from "sched/fair: Rework
 load_balance()"



On 26 Oct 2020, at 10:24, Vincent Guittot wrote:

> Le lundi 26 oct. 2020 à 08:45:27 (-0400), Chris Mason a écrit :
>> On 26 Oct 2020, at 4:39, Vincent Guittot wrote:
>>
>>> Hi Chris
>>>
>>> On Sat, 24 Oct 2020 at 01:49, Chris Mason <clm@...com> wrote:
>>>>
>>>> Hi everyone,
>>>>
>>>> We’re validating a new kernel in the fleet, and compared with 
>>>> v5.2,
>>>
>>> Which version are you using ?
>>> several improvements have been added since v5.5 and the rework of
>>> load_balance
>>
>> We’re validating v5.6, but all of the numbers referenced in this 
>> patch are
>> against v5.9.  I usually try to back port my way to victory on this 
>> kind of
>> thing, but mainline seems to behave exactly the same as 0b0695f2b34a 
>> wrt
>> this benchmark.
>
> ok. Thanks for the confirmation
>
> I have been able to reproduce the problem on my setup.

Thanks for taking a look!  Can I ask what parameters you used on 
schbench, and what kind of results you saw?  Mostly I’m trying to make 
sure it’s a useful tool, but also the patch didn’t change things 
here.

>
> Could you try the fix below ?
>
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -9049,7 +9049,8 @@ static inline void calculate_imbalance(struct 
> lb_env *env, struct sd_lb_stats *s
>          * emptying busiest.
>          */
>         if (local->group_type == group_has_spare) {
> -               if (busiest->group_type > group_fully_busy) {
> +               if ((busiest->group_type > group_fully_busy) &&
> +                   (busiest->group_weight > 1)) {
>                         /*
>                          * If busiest is overloaded, try to fill spare
>                          * capacity. This might end up creating spare 
> capacity
>
>
> When we calculate an imbalance at te smallest level, ie between CPUs 
> (group_weight == 1),
> we should try to spread tasks on cpus instead of trying to fill spare 
> capacity.

With this patch on top of v5.9, my latencies are unchanged.  I’m 
building against current Linus now just in case I’m missing other 
fixes.

-chris

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ