linux-kernel - Re: [PATCH v8 1/9] sched/fair: fix unfairness at wakeup

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAKfTPtC5f7jfz+=rLQp_gjaEqGQ=9B-4aX-4urZP6CPVEf1LwA@mail.gmail.com>
Date:   Tue, 15 Nov 2022 08:26:05 +0100
From:   Vincent Guittot <vincent.guittot@...aro.org>
To:     Dietmar Eggemann <dietmar.eggemann@....com>
Cc:     mingo@...hat.com, peterz@...radead.org, juri.lelli@...hat.com,
        rostedt@...dmis.org, bsegall@...gle.com, mgorman@...e.de,
        bristot@...hat.com, vschneid@...hat.com,
        linux-kernel@...r.kernel.org, parth@...ux.ibm.com,
        qyousef@...alina.io, chris.hyser@...cle.com,
        patrick.bellasi@...bug.net, David.Laight@...lab.com,
        pjt@...gle.com, pavel@....cz, tj@...nel.org, qperret@...gle.com,
        tim.c.chen@...ux.intel.com, joshdon@...gle.com, timj@....org,
        kprateek.nayak@....com, yu.c.chen@...el.com,
        youssefesmat@...omium.org, joel@...lfernandes.org
Subject: Re: [PATCH v8 1/9] sched/fair: fix unfairness at wakeup

On Mon, 14 Nov 2022 at 20:13, Dietmar Eggemann <dietmar.eggemann@....com> wrote:
>
> On 10/11/2022 18:50, Vincent Guittot wrote:
> > At wake up, the vruntime of a task is updated to not be more older than
> > a sched_latency period behind the min_vruntime. This prevents long sleeping
> > task to get unlimited credit at wakeup.
> > Such waking task should preempt current one to use its CPU bandwidth but
> > wakeup_gran() can be larger than sched_latency, filter out the
> > wakeup preemption and as a results steals some CPU bandwidth to
> > the waking task.
> >
> > Make sure that a task, which vruntime has been capped, will preempt current
> > task and use its CPU bandwidth even if wakeup_gran() is in the same range
> > as sched_latency.
>
> Looks like that gran can be nuch higher than sched_latency for extreme
> cases?

It's not that extreme, all tasks with nice prio 5 and above will face
the problem

>
> >
> > If the waking task failed to preempt current it could to wait up to
> > sysctl_sched_min_granularity before preempting it during next tick.
> >
> > Strictly speaking, we should use cfs->min_vruntime instead of
> > curr->vruntime but it doesn't worth the additional overhead and complexity
> > as the vruntime of current should be close to min_vruntime if not equal.
>
> ^^^ Does this related to the `if (vdiff > gran) return 1` condition in
> wakeup_preempt_entity()?

yes

>
> [...]
>
> > @@ -7187,6 +7171,18 @@ wakeup_preempt_entity(struct sched_entity *curr, struct sched_entity *se)
> >               return -1;
> >
> >       gran = wakeup_gran(se);
> > +
> > +     /*
> > +      * At wake up, the vruntime of a task is capped to not be older than
> > +      * a sched_latency period compared to min_vruntime. This prevents long
> > +      * sleeping task to get unlimited credit at wakeup. Such waking up task
> > +      * has to preempt current in order to not lose its share of CPU
> > +      * bandwidth but wakeup_gran() can become higher than scheduling period
> > +      * for low priority task. Make sure that long sleeping task will get a
>
> low priority task or taskgroup with low cpu.shares, right?

yes

>
> 6 CPUs
>
> sysctl_sched
>   .sysctl_sched_latency              : 18.000000
>   .sysctl_sched_min_granularity      : 2.250000
>   .sysctl_sched_idle_min_granularity : 0.750000
>   .sysctl_sched_wakeup_granularity   : 3.000000
>   ...
>
> p1 & p2 affine to CPUX
>
>      '/'
>      /\
>    p1  p2
>
> p1 & p2 nice=0        - vdiff=9ms gran=3ms lat_max=6.75ms
> p1 & p2 nice=4        - vdiff=9ms gran=7.26ms lat_max=6.75ms

p1 & p2 nice = 5        - vdiff=9ms gran=9.17ms lat_max=6.75ms

> p1 & p2 nice=19       - vdiff=9ms gran=204.79ms lat_max=6.75ms
>
>
>      '/'
>      /\
>     A  B
>    /    \
>   p1    p2
>
> A & B cpu.shares=1024 - vdiff=9ms gran=3ms lat_max=6.75ms
> A & B cpu.shares=448  - vdiff=9ms gran=6.86ms lat_max=6.75ms
> A & B cpu.shares=2    - vdiff=9ms gran=1536ms lat_max=6.75ms
>
> > +      * chance to preempt current.
> > +      */
> > +     gran = min_t(s64, gran, get_latency_max());
> > +
>
> [...]
>
> > @@ -2448,6 +2448,34 @@ extern unsigned int sysctl_numa_balancing_scan_period_max;
> >  extern unsigned int sysctl_numa_balancing_scan_size;
> >  #endif
> >
> > +static inline unsigned long  get_sched_latency(bool idle)
>                               ^^
> 2 white-spaces

ok

>
> [...]
>
> > +
> > +static inline unsigned long  get_latency_max(void)
>                               ^^

ok

>
> [...]