[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20220708123511.GA22615@vingu-book>
Date: Fri, 8 Jul 2022 14:35:11 +0200
From: Vincent Guittot <vincent.guittot@...aro.org>
To: Dietmar Eggemann <dietmar.eggemann@....com>
Cc: mingo@...hat.com, peterz@...radead.org, juri.lelli@...hat.com,
rostedt@...dmis.org, bsegall@...gle.com, mgorman@...e.de,
bristot@...hat.com, vschneid@...hat.com,
linux-kernel@...r.kernel.org, david.chen@...anix.com,
zhangqiao22@...wei.com
Subject: Re: [PATCH] sched/fair: fix case with reduced capacity CPU
Le vendredi 08 juil. 2022 à 12:10:40 (+0200), Dietmar Eggemann a écrit :
> On 08/07/2022 09:17, Vincent Guittot wrote:
> > On Thu, 7 Jul 2022 at 18:43, Dietmar Eggemann <dietmar.eggemann@....com> wrote:
> >>
> >> On 02/07/2022 06:52, Vincent Guittot wrote:
>
> [...]
>
> >>> The rework of the load balance has filterd the case when the CPU is
>
> s/filterd/filtered
>
> >>> classified to be fully busy but its capacity is reduced.
> >>>
> >>> Check if CPU's capacity is reduced while gathering load balance statistics
> >>> and classify it group_misfit_task instead of group_fully_busy so we can
>
> enum group_type {
>
> ...
> /*
> * SD_ASYM_CPUCAPACITY only: One task doesn't fit with CPU's capacity
> * and must be migrated to a more powerful CPU.
> */
> group_misfit_task
> ...
>
> This `SD_ASYM_CPUCAPACITY only:` should be removed now.
Yes
>
> [...]
>
> >>> @@ -8798,6 +8798,19 @@ sched_asym(struct lb_env *env, struct sd_lb_stats *sds, struct sg_lb_stats *sgs
> >>> return sched_asym_prefer(env->dst_cpu, group->asym_prefer_cpu);
> >>> }
> >>>
> >>> +static inline bool
> >>> +sched_reduced_capacity(struct rq *rq, struct sched_domain *sd)
>
> minor: Why not `static inline int check_reduced_capacity()` ? All
> similar functions like check_cpu_capacity(), check_cpu_capacity() follow
> this approach.
Mainly because it aims to return true or false.
IIRC check_cpu_capacity has replaced fix_small_capacity which was not a bool but the
number of task a group could handle but kept the int
>
> [...]
>
> >>> @@ -8851,11 +8865,17 @@ static inline void update_sg_lb_stats(struct lb_env *env,
> >>> if (local_group)
> >>> continue;
> >>>
> >>> - /* Check for a misfit task on the cpu */
> >>> - if (env->sd->flags & SD_ASYM_CPUCAPACITY &&
> >>> - sgs->group_misfit_task_load < rq->misfit_task_load) {
> >>> - sgs->group_misfit_task_load = rq->misfit_task_load;
> >>> - *sg_status |= SG_OVERLOAD;
> >>> + if (env->sd->flags & SD_ASYM_CPUCAPACITY) {
> >>> + /* Check for a misfit task on the cpu */
> >>> + if (sgs->group_misfit_task_load < rq->misfit_task_load) {
> >>> + sgs->group_misfit_task_load = rq->misfit_task_load;
> >>> + *sg_status |= SG_OVERLOAD;
> >>> + }
> >>> + } else if ((env->idle != CPU_NOT_IDLE) &&
> >>> + sched_reduced_capacity(rq, env->sd) &&
> >>> + (sgs->group_misfit_task_load < load)) {
> >>> + /* Check for a task running on a CPU with reduced capacity */
> >>> + sgs->group_misfit_task_load = load;
> >>> }
>
> Minor:
>
> This now has if(A)
> if(B)
> else if(C && B')
>
> little bit harder to read.
>
yeah I started with the below but then optimized it. I can come back to the
version below if it's easier to read
if (env->sd->flags & SD_ASYM_CPUCAPACITY) {
/* Check for a misfit task on the cpu */
if (sgs->group_misfit_task_load < rq->misfit_task_load) {
sgs->group_misfit_task_load = rq->misfit_task_load;
*sg_status |= SG_OVERLOAD;
}
} else if ((env->idle != CPU_NOT_IDLE) &&
sched_reduced_capacity(rq, env->sd)) {
/* Check for a task running on a CPU with reduced capacity */
if (sgs->group_misfit_task_load < load))
sgs->group_misfit_task_load = load;
}
> [...]
>
> >> I'm wondering why you've chosen that hybrid approach `group_misfit_task
> >> -> migrate_load` and not `group_misfit_task -> migrate_misfit`.
> >
> > because, it means enabling the tracking of misfit task on rq at each
> > task enqueue/dequeue/tick ... Then mistfit for heterogeneous platform
> > checks max_cpu_capacity what we don't care and will trigger unwanted
> > misfit migration for smp
>
> Agreed, rq->misfit_task_load can't be used here.
>
> >> It looks like this `rq->cfs.h_nr_running = 1` case almost (since we
> >> check `busiest->nr_running > 1`) always ends up in the load_balance()
> >> `if (!ld_moved)` condition and need_active_balance() can return 1 in
> >> case `if ((env->idle != CPU_NOT_IDLE) && ...` condition. This leads to
> >> active load_balance and this
> >>
> >> IMHO, the same you can achieve when you would stay with
> >> `group_misfit_task -> migrate_misfit`.
> >>
> >> I think cpu_load(rq) can be used instead of `rq->misfit_task_load` in
> >> the migrate_misfit case of find_busiest_queue() too.
> >
> > I don't think because you can have a higher cpu_load() but not being misfit
>
> You're right, I forgot about this. Essentially we would need extra state
> (e.g. in lb_env) to save which CPU in the busiest group has the misfit.
Powered by blists - more mailing lists