[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAKfTPtBTH10k56s-sU3TX+vL6Xas-QArLW-CBz7ZeqU0BNzMQA@mail.gmail.com>
Date: Fri, 26 Jul 2019 16:47:30 +0200
From: Vincent Guittot <vincent.guittot@...aro.org>
To: Valentin Schneider <valentin.schneider@....com>
Cc: linux-kernel <linux-kernel@...r.kernel.org>,
Ingo Molnar <mingo@...hat.com>,
Peter Zijlstra <peterz@...radead.org>,
Quentin Perret <quentin.perret@....com>,
Dietmar Eggemann <dietmar.eggemann@....com>,
Morten Rasmussen <Morten.Rasmussen@....com>,
Phil Auld <pauld@...hat.com>
Subject: Re: [PATCH 3/5] sched/fair: rework load_balance
On Fri, 26 Jul 2019 at 16:01, Valentin Schneider
<valentin.schneider@....com> wrote:
>
> On 26/07/2019 13:30, Vincent Guittot wrote:
> >> We can avoid this entirely by going straight for an active balance when
> >> we are balancing misfit tasks (which we really should be doing TBH).
> >
> > but your misfit task might not be the running one anymore when
> > load_balance effectively happens
> >
>
> We could add a check in the active balance bits to make sure the current
> task is still a misfit task (albeit not necessarily the one we wanted to
> migrate, since we can't really differentiate them).
>
> Misfit migration shouldn't go through detach_tasks() - if the misfit task
> is still the running task, we want to go for active balance anyway, and if
> it's not the running task anymore then we should try to detect it and give
> up - there's not much else we can do. From a rq's perspective, a task can
> only ever be misfit if it's currently running.
>
> The current code can totally active balance the wrong task if the load
> balancer saw a misfit task in update_sd_lb_stats() but it moved away in the
> meantime, so making misfit balancing skip detach_tasks() would be a straight
> improvement IMO: we can still get some active balance collaterals, but at
> least we don't wrongfully detach a non-running task that happened to have
> the right load shape.
>
> >>
> >> If we *really* want to be surgical about misfit migration, we could track
> >> the task itself via a pointer to its task_struct, but IIRC Morten
> >
> > I thought about this but task can have already die at that time and
> > the pointer is no more relevant.
> > Or we should parse the list of task still attached to the cpu and
> > compare them with the saved pointer but then it's not scalable and
> > will consume a lot of time
> >
> >> purposely avoided this due to all the fun synchronization issues that
> >> come with it.
> >>
> >> With that out of the way, I still believe we should maximize the migrated
> >> load when dealing with several misfit tasks - there's not much else you can
> >> look at anyway to make a decision.
> >
> > But you can easily select a task that is not misfit so what is the best/worst ?
> > select a fully wrong task or at least one of the real misfit tasks
> >
>
> Utilization can't help you select a "best" misfit task amongst several
> since the utilization of misfit tasks is by definition meaningless.
>
> I do agree that looking at utilization when detaching the task prevents
> picking a non-misfit task, but those are two different issues:
>
> 1) Among several rqs/groups with misfit tasks, pick the busiest one
> (this is where I'm arguing we should use load)
> 2) When detaching a task, make sure it's a misfit task (this is where
> you're arguing we should use utilization).
>
> > I'm fine to go back and use load instead of util but it's not robust IMO.
> >
>
> [...]
> >> What if there is spare capacity but no idle CPUs? In scenarios like this
> >> we should balance utilization. We could wait for a newidle balance to
> >
> > why should we balance anything ? all tasks have already enough running time.
> > It's better to wait for a cpu to become idle instead of trying to
> > predict which one will be idle first and migrate task uselessly
> > because other tasks can easily wakeup in the meantime
> >
>
> I probably need to play with this and create some synthetic use cases.
>
> What I had in mind is something like 2 CPUs, CPU0 running a 20% task and
> CPU1 running 6 10% tasks.
>
> If CPU0 runs the load balancer, balancing utilization would mean pulling
> 2 tasks from CPU1 to reach the domain-average of 40%. The good side of this
> is that we could save ourselves from running some newidle balances, but
> I'll admit that's all quite "finger in the air".
Don't forget that scheduler also selects a cpu when task wakeup and
should cope with such situation
>
> >> happen, but it'd be a shame to repeatedly do this when we could
> >> preemptively balance utilization.
> >>
Powered by blists - more mailing lists