linux-kernel - Re: [PATCH v4 5/8] sched/fair: Take into account latency priority at wakeup

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAKfTPtCWE5O4TeTBG8hgar8w56-WzvmX7aR9D7dXN_vJ5LCLyQ@mail.gmail.com>
Date:   Mon, 19 Sep 2022 17:39:26 +0200
From:   Vincent Guittot <vincent.guittot@...aro.org>
To:     Dietmar Eggemann <dietmar.eggemann@....com>
Cc:     mingo@...hat.com, peterz@...radead.org, juri.lelli@...hat.com,
        rostedt@...dmis.org, bsegall@...gle.com, mgorman@...e.de,
        bristot@...hat.com, vschneid@...hat.com,
        linux-kernel@...r.kernel.org, parth@...ux.ibm.com,
        qais.yousef@....com, chris.hyser@...cle.com,
        valentin.schneider@....com, patrick.bellasi@...bug.net,
        David.Laight@...lab.com, pjt@...gle.com, pavel@....cz,
        tj@...nel.org, qperret@...gle.com, tim.c.chen@...ux.intel.com,
        joshdon@...gle.com
Subject: Re: [PATCH v4 5/8] sched/fair: Take into account latency priority at wakeup

On Mon, 19 Sept 2022 at 12:05, Dietmar Eggemann
<dietmar.eggemann@....com> wrote:
>
> On 16/09/2022 10:03, Vincent Guittot wrote:
>
> [...]
>
> > @@ -4512,7 +4519,7 @@ int sched_fork(unsigned long clone_flags, struct task_struct *p)
> >       p->prio = current->normal_prio;
> >
> >       /* Propagate the parent's latency requirements to the child as well */
> > -     p->latency_nice = current->latency_nice;
> > +     p->latency_prio = current->latency_prio;
>
> Isn't here a `set_latency_offset(p)` missing here?

Hmm, I think it's the opposite and the line above is a nop from the
beginning (i.e. patch 2).

>
> >
> >       uclamp_fork(p);
> >
>
> [...]
>
> > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> > index e8c1b889dcbb..a20eadb0af97 100644
> > --- a/kernel/sched/fair.c
> > +++ b/kernel/sched/fair.c
> > @@ -4574,6 +4574,8 @@ dequeue_entity(struct cfs_rq *cfs_rq, struct sched_entity *se, int flags)
> >               update_idle_cfs_rq_clock_pelt(cfs_rq);
> >  }
> >
> > +static long wakeup_latency_gran(struct sched_entity *curr, struct sched_entity *se);
>
> minor: `struct sched_entity *curr` ... doesn't have to be current
> (cfs_rq->curr). Isn't this more like `struct sched_entity *sea, struct
> sched_entity *seb`? Anyway, it's already the case for
> `wakeup_preempt_entity`.
>
> [...]
>
> > @@ -5732,6 +5735,35 @@ static int sched_idle_cpu(int cpu)
> >  }
> >  #endif
> >
> > +static void set_next_buddy(struct sched_entity *se);
> > +
> > +static void check_preempt_from_others(struct cfs_rq *cfs, struct sched_entity *se)
>
> minor: Why `struct cfs_rq *cfs` and not `struct cfs_rq *cfs_rq` ?
>
> Using `cfs_rq` would make it more consistent when looking for things
> like `cfs_rq->nr_running` for example.
>
> > +{
> > +     struct sched_entity *next;
> > +
> > +     if (se->latency_offset >= 0)
> > +             return;
> > +
> > +     if (cfs->nr_running <= 1)
> > +             return;
> > +     /*
> > +      * When waking from idle, we don't need to check to preempt at wakeup
>
> s/idle/others ?

yes, I forgot to update the comment

>
> > +      * the idle thread and don't set next buddy as a candidate for being
> > +      * picked in priority.
> > +      * In case of simultaneous wakeup from idle, the latency sensitive tasks
> > +      * lost opportunity to preempt non sensitive tasks which woke up
> > +      * simultaneously.
> > +      */
>
> The position of this comment block within this function is somehow
> misleading since it describes the reason for the function rather then a
> particular condition within this function. Wouldn't it be more readable
> when it would be a function header comment instead?

I put it after the usual early return tests to put the comment close
to the useful part: the use of next buddy and __pick_first_entity()

>
> [...]
>
> > @@ -7148,6 +7181,22 @@ balance_fair(struct rq *rq, struct task_struct *prev, struct rq_flags *rf)
> >  }
> >  #endif /* CONFIG_SMP */
> >
> > +static long wakeup_latency_gran(struct sched_entity *curr, struct sched_entity *se)
> > +{
> > +     long latency_offset = se->latency_offset;
> > +
> > +     /*
> > +      * A negative latency weigth means that the sched_entity has latency
>
> s/weigth/latency_offset ?

yes

>
>
> > +      * requirement that needs to be evaluated versus other entity.
> > +      * Otherwise, use the latency weight to evaluate how much scheduling
> > +      * delay is acceptable by se.
> > +      */
> > +     if ((se->latency_offset < 0) || (curr->latency_offset < 0))
> > +             latency_offset -= curr->latency_offset;
>
> I still don't get the rationale behind why when either one (se or curr)
> of the latency_nice values is negative, we use the diff between them but
> if not, we only care about se's value. Why don't you always use the diff
> between se and curr? Since we have a range [-20 ... 19] why shouldn't we
> use the difference between let's say se = 19 and curr = 5?
> You discussed this with Tao Zhou on the v1 but I didn't understand it fully.

Let say that current has a latency nice prio of 19 and a task A with a
latency nice of 10 wakes up. Both tasks don't care about scheduling
latency (current more than task A). If we use the diff, the output of
wakeup_latency_gran() would be negative (-10ms) which reflects the
fact that the waking task is sensitive to the latency and wants to
preempt current even if its vruntime is after. But obviously both
current and task A don't care to preempt at wakeup.

>
> [...]