[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAKfTPtBZhJToDAvre1Gvz1oBUZy893_p44JpDPMOWWk7-SXAyQ@mail.gmail.com>
Date: Fri, 7 Sep 2018 14:55:54 +0200
From: Vincent Guittot <vincent.guittot@...aro.org>
To: Peter Zijlstra <peterz@...radead.org>
Cc: Ingo Molnar <mingo@...nel.org>,
linux-kernel <linux-kernel@...r.kernel.org>,
Dietmar Eggemann <dietmar.eggemann@....com>,
jhugo@...eaurora.org
Subject: Re: [PATCH] sched/fair: fix load_balance redo for null imbalance
On Fri, 7 Sep 2018 at 14:35, Vincent Guittot <vincent.guittot@...aro.org> wrote:
>
> Le Friday 07 Sep 2018 à 13:37:49 (+0200), Peter Zijlstra a écrit :
> > On Fri, Sep 07, 2018 at 09:51:04AM +0200, Vincent Guittot wrote:
> > > It can happen that load_balance finds a busiest group and then a busiest rq
> > > but the calculated imbalance is in fact null.
> >
> > Cute. Does that happen often?
>
> I have a use case with RT tasks that reproduces the problem regularly.
> It happens at least when we have CPUs with different capacity either because
> of heterogeous CPU or because of RT/DL reducing available capacity for cfs
> I have put the call path that trigs the problem below and accroding to the
> comment it seems that we can reach similar state when playing with priority.
>
> >
> > > If the calculated imbalance is null, it's useless to try to find a busiest
> > > rq as no task will be migrated and we can return immediately.
> > >
> > > This situation can happen with heterogeneous system or smp system when RT
> > > tasks are decreasing the capacity of some CPUs.
> >
> > Is it the result of one of those "force_balance" conditions in
> > find_busiest_group() ? Should we not fix that to then return NULL
> > instead?
>
> The UC is:
> We have a newly_idle load balance that is triggered when RT task becomes idle
> ( but I think that I have seen that with idle load balance too)
>
> we trigs:
> if (env->idle != CPU_NOT_IDLE && group_has_capacity(env, local) &&
> busiest->group_no_capacity)
> goto force_balance;
>
> In calculate_imbalance we use the path
> /*
> * Avg load of busiest sg can be less and avg load of local sg can
> * be greater than avg load across all sgs of sd because avg load
> * factors in sg capacity and sgs with smaller group_type are
> * skipped when updating the busiest sg:
> */
> if (busiest->avg_load <= sds->avg_load ||
> local->avg_load >= sds->avg_load) {
> env->imbalance = 0;
> return fix_small_imbalance(env, sds);
> }
>
> but fix_small_imbalance finally decides to return without modifying imbalance
> like here
> if (busiest->avg_load + scaled_busy_load_per_task >=
> local->avg_load + (scaled_busy_load_per_task * imbn)) {
> env->imbalance = busiest->load_per_task;
> return;
> }
>
> Beside this patch, I'm preparing another patch in fix small imbalance to
> ensure 1 task per CPU in similar situation but according to the comment above,
> we can reach this situation because of tasks priority
I have just done a quick test on my smp hikey board (dual quad core
arm64) by adding a log in dmesg each time we have the condition
busiest != null and imbalance == 0. The log happens from time to time
when I generate some activity on the baord like syncing the filesystem
before running a test. But I don't have the details. The logs happen
with and without the next patch that I mentioned above. So it probably
means that we can trig this situation with other UCs
>
Powered by blists - more mailing lists