linux-kernel - Re: [PATCH v2] sched/fair: fix case with reduced capacity CPU

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAKfTPtBDYqXP4gx9_QFK-ibrC=FynufbXAiup6hxsmBT2AxOQQ@mail.gmail.com>
Date:   Mon, 11 Jul 2022 18:42:25 +0200
From:   Vincent Guittot <vincent.guittot@...aro.org>
To:     Qais Yousef <qais.yousef@....com>
Cc:     mingo@...hat.com, peterz@...radead.org, juri.lelli@...hat.com,
        dietmar.eggemann@....com, rostedt@...dmis.org, bsegall@...gle.com,
        mgorman@...e.de, bristot@...hat.com, vschneid@...hat.com,
        linux-kernel@...r.kernel.org, david.chen@...anix.com,
        zhangqiao22@...wei.com
Subject: Re: [PATCH v2] sched/fair: fix case with reduced capacity CPU

On Mon, 11 Jul 2022 at 18:03, Qais Yousef <qais.yousef@....com> wrote:
>
> Hi Vincent
>
> On 07/08/22 17:44, Vincent Guittot wrote:
> > The capacity of the CPU available for CFS tasks can be reduced because of
> > other activities running on the latter. In such case, it's worth trying to
> > move CFS tasks on a CPU with more available capacity.
> >
> > The rework of the load balance has filtered the case when the CPU is
> > classified to be fully busy but its capacity is reduced.
> >
> > Check if CPU's capacity is reduced while gathering load balance statistic
> > and classify it group_misfit_task instead of group_fully_busy so we can
> > try to move the load on another CPU.
> >
> > Reported-by: David Chen <david.chen@...anix.com>
> > Reported-by: Zhang Qiao <zhangqiao22@...wei.com>
> > Signed-off-by: Vincent Guittot <vincent.guittot@...aro.org>
> > Tested-by: David Chen <david.chen@...anix.com>
> > Tested-by: Zhang Qiao <zhangqiao22@...wei.com>
> > ---
>
> [...]
>
> > @@ -8820,8 +8833,9 @@ static inline void update_sg_lb_stats(struct lb_env *env,
> >
> >       for_each_cpu_and(i, sched_group_span(group), env->cpus) {
> >               struct rq *rq = cpu_rq(i);
> > +             unsigned long load = cpu_load(rq);
> >
> > -             sgs->group_load += cpu_load(rq);
> > +             sgs->group_load += load;
> >               sgs->group_util += cpu_util_cfs(i);
> >               sgs->group_runnable += cpu_runnable(rq);
> >               sgs->sum_h_nr_running += rq->cfs.h_nr_running;
> > @@ -8851,11 +8865,17 @@ static inline void update_sg_lb_stats(struct lb_env *env,
> >               if (local_group)
> >                       continue;
> >
> > -             /* Check for a misfit task on the cpu */
> > -             if (env->sd->flags & SD_ASYM_CPUCAPACITY &&
> > -                 sgs->group_misfit_task_load < rq->misfit_task_load) {
> > -                     sgs->group_misfit_task_load = rq->misfit_task_load;
> > -                     *sg_status |= SG_OVERLOAD;
> > +             if (env->sd->flags & SD_ASYM_CPUCAPACITY) {
> > +                     /* Check for a misfit task on the cpu */
> > +                     if (sgs->group_misfit_task_load < rq->misfit_task_load) {
> > +                             sgs->group_misfit_task_load = rq->misfit_task_load;
> > +                             *sg_status |= SG_OVERLOAD;
> > +                     }
> > +             } else if ((env->idle != CPU_NOT_IDLE) &&
> > +                        sched_reduced_capacity(rq, env->sd)) {
> > +                     /* Check for a task running on a CPU with reduced capacity */
> > +                     if (sgs->group_misfit_task_load < load)
> > +                             sgs->group_misfit_task_load = load;
> >               }
> >       }
>
> Small questions mostly for my education purposes.
>
> The new condition only applies for SMP systems. The reason asym systems don't
> care is because misfit check already considers capacity pressure when checking
> that the task fits_capacity()?

Yes

>
> It **seems** to me that the migration margin in fits_capacity() acts like the
> sd->imbalance_pct when check_cpu_capacity() is called by
> sched_reduced_capacity(), did I get it right?

Yes

>
> If I got it right, if the migration margin ever tweaked, could we potentially
> start seeing this kind of reported issue on asym systems then? I guess not. It
> just seems to me for asym systems tweaking the migration margin is similar to
> tweaking imbalance_pct for smp ones. But the subtlety is greater as
> imbalance_pct is still used in asym systems.

You should not because the task will end up being misfit whatever the
margin. The only change would be how fast you will detect and migrate


>
>
> Thanks
>
> --
> Qais Yousef