[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <4d3a67f5-c9c4-6397-7405-6f0efbd49d5c@arm.com>
Date: Fri, 26 Jul 2019 11:41:18 +0100
From: Valentin Schneider <valentin.schneider@....com>
To: Vincent Guittot <vincent.guittot@...aro.org>
Cc: linux-kernel <linux-kernel@...r.kernel.org>,
Ingo Molnar <mingo@...hat.com>,
Peter Zijlstra <peterz@...radead.org>,
Quentin Perret <quentin.perret@....com>,
Dietmar Eggemann <dietmar.eggemann@....com>,
Morten Rasmussen <Morten.Rasmussen@....com>,
Phil Auld <pauld@...hat.com>
Subject: Re: [PATCH 3/5] sched/fair: rework load_balance
On 26/07/2019 10:01, Vincent Guittot wrote:
>> Huh, interesting. Why go for utilization?
>
> Mainly because that's what is used to detect a misfit task and not the load
>
>>
>> Right now we store the load of the task and use it to pick the "biggest"
>> misfit (in terms of load) when there are more than one misfit tasks to
>> choose:
>
> But having a big load doesn't mean that you have a big utilization
>
> so you can trig the misfit case because of task A with a big
> utilization that doesn't fit to its local cpu. But then select a task
> B in detach_tasks that has a small utilization but a big weight and as
> a result a higher load
> And task B will never trig the misfit UC by itself and should not
> steal the pulling opportunity of task A
>
We can avoid this entirely by going straight for an active balance when
we are balancing misfit tasks (which we really should be doing TBH).
If we *really* want to be surgical about misfit migration, we could track
the task itself via a pointer to its task_struct, but IIRC Morten
purposely avoided this due to all the fun synchronization issues that
come with it.
With that out of the way, I still believe we should maximize the migrated
load when dealing with several misfit tasks - there's not much else you can
look at anyway to make a decision.
It sort of makes sense when e.g. you have two misfit tasks stuck on LITTLE
CPUs and you finally have a big CPU being freed, it would seem fair to pick
the one that's been "throttled" the longest - at equal niceness, that would
be the one with the highest load.
>>
>> update_sd_pick_busiest():
>> ,----
>> | /*
>> | * If we have more than one misfit sg go with the biggest misfit.
>> | */
>> | if (sgs->group_type == group_misfit_task &&
>> | sgs->group_misfit_task_load < busiest->group_misfit_task_load)
>> | return false;
>> `----
>>
>> I don't think it makes much sense to maximize utilization for misfit tasks:
>> they're over the capacity margin, which exactly means "I can't really tell
>> you much on that utilization other than it doesn't fit".
>>
>> At the very least, this rq field should be renamed "misfit_task_util".
>
> yes. I agree that i should rename the field
>
>>
>> [...]
>>
>>> @@ -7060,12 +7048,21 @@ static unsigned long __read_mostly max_load_balance_interval = HZ/10;
>>> enum fbq_type { regular, remote, all };
>>>
>>> enum group_type {
>>> - group_other = 0,
>>> + group_has_spare = 0,
>>> + group_fully_busy,
>>> group_misfit_task,
>>> + group_asym_capacity,
>>> group_imbalanced,
>>> group_overloaded,
>>> };
>>>
>>> +enum group_migration {
>>> + migrate_task = 0,
>>> + migrate_util,
>>> + migrate_load,
>>> + migrate_misfit,
>>
>> Can't we have only 3 imbalance types (task, util, load), and make misfit
>> fall in that first one? Arguably it is a special kind of task balance,
>> since it would go straight for the active balance, but it would fit a
>> `migrate_task` imbalance with a "go straight for active balance" flag
>> somewhere.
>
> migrate_misfit uses its own special condition to detect the task that
> can be pulled compared to the other ones
>
Since misfit is about migrating running tasks, a `migrate_task` imbalance
with a flag that goes straight to active balancing should work, no?
[...]
>> Rather than filling the local group, shouldn't we follow the same strategy
>> as for load, IOW try to reach an average without pushing local above nor
>> busiest below ?
>
> But we don't know if this will be enough to make the busiest group not
> overloaded anymore
>
> This is a transient state:
> a group is overloaded, another one has spare capacity
> How to balance the system will depend of how much overload if in the
> group and we don't know this value.
> The only solution is to:
> - try to pull as much task as possible to fill the spare capacity
> - Is the group still overloaded ? use avg_load to balance the system
> because both group will be overloaded
> - Is the group no more overloaded ? balance the number of idle cpus
>
>>
>> We could build an sds->avg_util similar to sds->avg_load.
>
> When there is spare capacity, we balances the number of idle cpus
>
What if there is spare capacity but no idle CPUs? In scenarios like this
we should balance utilization. We could wait for a newidle balance to
happen, but it'd be a shame to repeatedly do this when we could
preemptively balance utilization.
Powered by blists - more mailing lists