[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <83A9BEDF-20BB-4BAD-AABD-0EECB92BF8DF@fb.com>
Date: Mon, 26 Oct 2020 11:05:35 -0400
From: "Chris Mason" <clm@...com>
To: Vincent Guittot <vincent.guittot@...aro.org>
CC: Peter Zijlstra <peterz@...radead.org>,
Johannes Weiner <hannes@...xchg.org>,
Rik van Riel <riel@...riel.com>,
linux-kernel <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH] fix scheduler regression from "sched/fair: Rework
load_balance()"
On 26 Oct 2020, at 10:24, Vincent Guittot wrote:
> Le lundi 26 oct. 2020 à 08:45:27 (-0400), Chris Mason a écrit :
>> On 26 Oct 2020, at 4:39, Vincent Guittot wrote:
>>
>>> Hi Chris
>>>
>>> On Sat, 24 Oct 2020 at 01:49, Chris Mason <clm@...com> wrote:
>>>>
>>>> Hi everyone,
>>>>
>>>> We’re validating a new kernel in the fleet, and compared with
>>>> v5.2,
>>>
>>> Which version are you using ?
>>> several improvements have been added since v5.5 and the rework of
>>> load_balance
>>
>> We’re validating v5.6, but all of the numbers referenced in this
>> patch are
>> against v5.9. I usually try to back port my way to victory on this
>> kind of
>> thing, but mainline seems to behave exactly the same as 0b0695f2b34a
>> wrt
>> this benchmark.
>
> ok. Thanks for the confirmation
>
> I have been able to reproduce the problem on my setup.
Thanks for taking a look! Can I ask what parameters you used on
schbench, and what kind of results you saw? Mostly I’m trying to make
sure it’s a useful tool, but also the patch didn’t change things
here.
>
> Could you try the fix below ?
>
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -9049,7 +9049,8 @@ static inline void calculate_imbalance(struct
> lb_env *env, struct sd_lb_stats *s
> * emptying busiest.
> */
> if (local->group_type == group_has_spare) {
> - if (busiest->group_type > group_fully_busy) {
> + if ((busiest->group_type > group_fully_busy) &&
> + (busiest->group_weight > 1)) {
> /*
> * If busiest is overloaded, try to fill spare
> * capacity. This might end up creating spare
> capacity
>
>
> When we calculate an imbalance at te smallest level, ie between CPUs
> (group_weight == 1),
> we should try to spread tasks on cpus instead of trying to fill spare
> capacity.
With this patch on top of v5.9, my latencies are unchanged. I’m
building against current Linus now just in case I’m missing other
fixes.
-chris
Powered by blists - more mailing lists