linux-kernel - Re: [PATCH 2/2] sched/fair: Relax task

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAKfTPtA+w1Euepw+MP0o1EGYhoohpQm3jJP+WFQrmZ6Zet+F3g@mail.gmail.com>
Date:   Fri, 30 Apr 2021 08:58:42 +0200
From:   Vincent Guittot <vincent.guittot@...aro.org>
To:     Valentin Schneider <valentin.schneider@....com>
Cc:     linux-kernel <linux-kernel@...r.kernel.org>,
        Peter Zijlstra <peterz@...radead.org>,
        Ingo Molnar <mingo@...nel.org>,
        Dietmar Eggemann <dietmar.eggemann@....com>,
        Morten Rasmussen <morten.rasmussen@....com>,
        Qais Yousef <qais.yousef@....com>,
        Quentin Perret <qperret@...gle.com>,
        Pavan Kondeti <pkondeti@...eaurora.org>,
        Rik van Riel <riel@...riel.com>,
        Lingutla Chandrasekhar <clingutla@...eaurora.org>
Subject: Re: [PATCH 2/2] sched/fair: Relax task_hot() for misfit tasks

On Wed, 21 Apr 2021 at 12:52, Valentin Schneider
<valentin.schneider@....com> wrote:
>
> On 20/04/21 16:33, Vincent Guittot wrote:
> > On Mon, 19 Apr 2021 at 19:13, Valentin Schneider
> > <valentin.schneider@....com> wrote:
> >>
> >> On 16/04/21 15:51, Vincent Guittot wrote:
> >> > Le jeudi 15 avril 2021 ï¿½ 18:58:46 (+0100), Valentin Schneider a ï¿½crit :
> >> >> +
> >> >> +/*
> >> >> + * What does migrating this task do to our capacity-aware scheduling criterion?
> >> >> + *
> >> >> + * Returns 1, if the task needs more capacity than the dst CPU can provide.
> >> >> + * Returns 0, if the task needs the extra capacity provided by the dst CPU
> >> >> + * Returns -1, if the task isn't impacted by the migration wrt capacity.
> >> >> + */
> >> >> +static int migrate_degrades_capacity(struct task_struct *p, struct lb_env *env)
> >> >> +{
> >> >> +    if (!(env->sd->flags & SD_ASYM_CPUCAPACITY))
> >> >> +            return -1;
> >> >> +
> >> >> +    if (!task_fits_capacity(p, capacity_of(env->src_cpu))) {
> >> >> +            if (cpu_capacity_greater(env->dst_cpu, env->src_cpu))
> >> >> +                    return 0;
> >> >> +            else if (cpu_capacity_greater(env->src_cpu, env->dst_cpu))
> >> >> +                    return 1;
> >> >> +            else
> >> >> +                    return -1;
> >> >> +    }
> >> >
> >> > Being there means that task fits src_cpu capacity so why testing p against dst_cpu ?
> >> >
> >>
> >> Because if p fits on src_cpu, we don't want to move it to a dst_cpu on
> >> which it *doesn't* fit.
> >
> > OK. I was confused because I thought that this was only to force
> > migration in case of group_misfit_task but you tried to extend to
> > other cases... I'm not convinced that you succeeded to cover all cases
> >
> > Also I found this function which returns 3 values a bit disturbing.
> > IIUC you tried to align to migrate_degrades_capacity but you should
> > have better aligned to task_hot and return only 0 or 1. -1 is not used
> >
>
> Ack, will do.
>
> >> >> @@ -7672,6 +7698,15 @@ int can_migrate_task(struct task_struct *p, struct lb_env *env)
> >> >>      if (tsk_cache_hot == -1)
> >> >>              tsk_cache_hot = task_hot(p, env);
> >> >>
> >> >> +    /*
> >> >> +     * On a (sane) asymmetric CPU capacity system, the increase in compute
> >> >> +     * capacity should offset any potential performance hit caused by a
> >> >> +     * migration.
> >> >> +     */
> >> >> +    if ((env->dst_grp_type == group_has_spare) &&
> >> >
> >> > Shouldn't it be env->src_grp_type == group_misfit_task to only care of misfit task case as
> >> > stated in $subject
> >> >
> >>
> >> Previously this was env->idle != CPU_NOT_IDLE, but I figured dst_grp_type
> >> could give us a better picture. Staring at this some more, this isn't so
> >> true when the group size goes up - there's no guarantees the dst_cpu is the
> >> one that has spare cycles, and the other CPUs might not be able to grant
> >> the capacity uplift dst_cpu can.
> >
> > yeah you have to keep checking for env->idle != CPU_NOT_IDLE
> >
> >>
> >> As for not using src_grp_type == group_misfit_task, this is pretty much the
> >> same as [1]. CPU-bound (misfit) task + some other task on the same rq
> >> implies group_overloaded classification when balancing at MC level (no SMT,
> >> so one group per CPU).
> >
> > Is it something that happens often or just a sporadic/transient state
> > ? I mean does it really worth the extra complexity and do you see
> > performance improvement ?
> >
>
> "Unfortunately" yes, this is a relatively common scenario when running "1
> big task per CPU" types of workloads. The expected behaviour for big.LITTLE
> systems is to upmigrate tasks stuck on the LITTLE CPUs as soon as a big CPU
> becomes free, usually via newidle balance (which, since they process work
> faster than the LITTLEs, is bound to happen), and an extra task being
> enqueued at "the wrong time" can prevent this from happening.
>
> This usually means a misfit task can take a few dozen extra ms than it

A few dozens is quite long. With a big core being idle, it should try
every 8ms on a quad x quad system and I suspect the next try will be
during the next tick. Would be good to understand why it has to wait
so much

> should to be migrated - in the tests I run (which are pretty much this 1
> hog per CPU workload) this happens about ~20% of the time.
>
> > You should better focus on fixing the simple case of group_misfit_task
> > task. This other cases looks far more complex with lot of corner cases
> >
> >>
> >> [1]: http://lore.kernel.org/r/jhjblcuv2mo.mognet@arm.com