lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Mon, 11 Jul 2022 18:42:25 +0200
From:   Vincent Guittot <vincent.guittot@...aro.org>
To:     Qais Yousef <qais.yousef@....com>
Cc:     mingo@...hat.com, peterz@...radead.org, juri.lelli@...hat.com,
        dietmar.eggemann@....com, rostedt@...dmis.org, bsegall@...gle.com,
        mgorman@...e.de, bristot@...hat.com, vschneid@...hat.com,
        linux-kernel@...r.kernel.org, david.chen@...anix.com,
        zhangqiao22@...wei.com
Subject: Re: [PATCH v2] sched/fair: fix case with reduced capacity CPU

On Mon, 11 Jul 2022 at 18:03, Qais Yousef <qais.yousef@....com> wrote:
>
> Hi Vincent
>
> On 07/08/22 17:44, Vincent Guittot wrote:
> > The capacity of the CPU available for CFS tasks can be reduced because of
> > other activities running on the latter. In such case, it's worth trying to
> > move CFS tasks on a CPU with more available capacity.
> >
> > The rework of the load balance has filtered the case when the CPU is
> > classified to be fully busy but its capacity is reduced.
> >
> > Check if CPU's capacity is reduced while gathering load balance statistic
> > and classify it group_misfit_task instead of group_fully_busy so we can
> > try to move the load on another CPU.
> >
> > Reported-by: David Chen <david.chen@...anix.com>
> > Reported-by: Zhang Qiao <zhangqiao22@...wei.com>
> > Signed-off-by: Vincent Guittot <vincent.guittot@...aro.org>
> > Tested-by: David Chen <david.chen@...anix.com>
> > Tested-by: Zhang Qiao <zhangqiao22@...wei.com>
> > ---
>
> [...]
>
> > @@ -8820,8 +8833,9 @@ static inline void update_sg_lb_stats(struct lb_env *env,
> >
> >       for_each_cpu_and(i, sched_group_span(group), env->cpus) {
> >               struct rq *rq = cpu_rq(i);
> > +             unsigned long load = cpu_load(rq);
> >
> > -             sgs->group_load += cpu_load(rq);
> > +             sgs->group_load += load;
> >               sgs->group_util += cpu_util_cfs(i);
> >               sgs->group_runnable += cpu_runnable(rq);
> >               sgs->sum_h_nr_running += rq->cfs.h_nr_running;
> > @@ -8851,11 +8865,17 @@ static inline void update_sg_lb_stats(struct lb_env *env,
> >               if (local_group)
> >                       continue;
> >
> > -             /* Check for a misfit task on the cpu */
> > -             if (env->sd->flags & SD_ASYM_CPUCAPACITY &&
> > -                 sgs->group_misfit_task_load < rq->misfit_task_load) {
> > -                     sgs->group_misfit_task_load = rq->misfit_task_load;
> > -                     *sg_status |= SG_OVERLOAD;
> > +             if (env->sd->flags & SD_ASYM_CPUCAPACITY) {
> > +                     /* Check for a misfit task on the cpu */
> > +                     if (sgs->group_misfit_task_load < rq->misfit_task_load) {
> > +                             sgs->group_misfit_task_load = rq->misfit_task_load;
> > +                             *sg_status |= SG_OVERLOAD;
> > +                     }
> > +             } else if ((env->idle != CPU_NOT_IDLE) &&
> > +                        sched_reduced_capacity(rq, env->sd)) {
> > +                     /* Check for a task running on a CPU with reduced capacity */
> > +                     if (sgs->group_misfit_task_load < load)
> > +                             sgs->group_misfit_task_load = load;
> >               }
> >       }
>
> Small questions mostly for my education purposes.
>
> The new condition only applies for SMP systems. The reason asym systems don't
> care is because misfit check already considers capacity pressure when checking
> that the task fits_capacity()?

Yes

>
> It **seems** to me that the migration margin in fits_capacity() acts like the
> sd->imbalance_pct when check_cpu_capacity() is called by
> sched_reduced_capacity(), did I get it right?

Yes

>
> If I got it right, if the migration margin ever tweaked, could we potentially
> start seeing this kind of reported issue on asym systems then? I guess not. It
> just seems to me for asym systems tweaking the migration margin is similar to
> tweaking imbalance_pct for smp ones. But the subtlety is greater as
> imbalance_pct is still used in asym systems.

You should not because the task will end up being misfit whatever the
margin. The only change would be how fast you will detect and migrate


>
>
> Thanks
>
> --
> Qais Yousef

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ