[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAKfTPtCvsy9rUXiAQ=zm=5DiAgJ1EegEkJ5iOkgro5Mnwsvfog@mail.gmail.com>
Date: Tue, 6 Jun 2023 16:54:13 +0200
From: Vincent Guittot <vincent.guittot@...aro.org>
To: Peter Zijlstra <peterz@...radead.org>
Cc: mingo@...nel.org, linux-kernel@...r.kernel.org,
juri.lelli@...hat.com, dietmar.eggemann@....com,
rostedt@...dmis.org, bsegall@...gle.com, mgorman@...e.de,
bristot@...hat.com, corbet@....net, qyousef@...alina.io,
chris.hyser@...cle.com, patrick.bellasi@...bug.net, pjt@...gle.com,
pavel@....cz, qperret@...gle.com, tim.c.chen@...ux.intel.com,
joshdon@...gle.com, timj@....org, kprateek.nayak@....com,
yu.c.chen@...el.com, youssefesmat@...omium.org,
joel@...lfernandes.org, efault@....de, tglx@...utronix.de
Subject: Re: [RFC][PATCH 13/15] sched/fair: Implement latency-nice
On Wed, 31 May 2023 at 14:47, Peter Zijlstra <peterz@...radead.org> wrote:
>
> Implement latency-nice as a modulation of the EEVDF r_i parameter,
> specifically apply the inverse sched_prio_to_weight[] relation on
> base_slice.
>
> Given a base slice of 3 [ms], this gives a range of:
>
> latency-nice 19: 3*1024 / 15 ~= 204.8 [ms]
> latency-nice -20: 3*1024 / 88761 ~= 0.034 [ms]
I have reread the publication. I have question about
Theorem 1: The lag of any active client k in a steady system is
bounded as follows,
-rmax < lagk (d) < max(rmax ; q);
and
Corollary 2: Consider a steady system and a client k such that no
request of client k is larger than a
time quantum. Then at any time t, the lag of client k is bounded as follows:
-q < lagk (t) < q
q being the time quanta a task can run
and rmax the maximum slice of active task
I wonder how it applies to us. What is our time quanta q ? I guess
that it's the tick because it is assumed that the algorithm evaluates
which task should run next for each q interval in order to fulfill the
fairness IIUC.So I don't think that we can assume a q shorter than the
tick (at least with current implementation) unless we trigger some
additional interrupts
Then asking for a request shorter than the tick also means that
scheduler must enqueue a new request (on behalf of the task) during
the tick and evaluate if the task is still the one to be scheduled
now. So similarly to q, the request size r should be at least a tick
in order to reevaluate which task will run next after the end of a
request. In fact, the real limit is : r/wi >= tick/(Sum wj)
On Arm64 system, tick is 4ms long and on arm32 it raises to 10ms
We can always not follow these assumptions made in the publication but
I wonder how we can then rely on its theorems and corollaries
>
> (which might not make sense)
>
> Signed-off-by: Peter Zijlstra (Intel) <peterz@...radead.org>
> Tested-by: K Prateek Nayak <kprateek.nayak@....com>
> ---
> kernel/sched/core.c | 14 ++++++++++----
> kernel/sched/fair.c | 22 +++++++++++++++-------
> kernel/sched/sched.h | 2 ++
> 3 files changed, 27 insertions(+), 11 deletions(-)
>
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -1305,6 +1305,12 @@ static void set_load_weight(struct task_
> }
> }
>
> +static inline void set_latency_prio(struct task_struct *p, int prio)
> +{
> + p->latency_prio = prio;
> + set_latency_fair(&p->se, prio - MAX_RT_PRIO);
> +}
> +
> #ifdef CONFIG_UCLAMP_TASK
> /*
> * Serializes updates of utilization clamp values
> @@ -4464,9 +4470,10 @@ static void __sched_fork(unsigned long c
> p->se.nr_migrations = 0;
> p->se.vruntime = 0;
> p->se.vlag = 0;
> - p->se.slice = sysctl_sched_base_slice;
> INIT_LIST_HEAD(&p->se.group_node);
>
> + set_latency_prio(p, p->latency_prio);
> +
> #ifdef CONFIG_FAIR_GROUP_SCHED
> p->se.cfs_rq = NULL;
> #endif
> @@ -4718,8 +4725,7 @@ int sched_fork(unsigned long clone_flags
>
> p->prio = p->normal_prio = p->static_prio;
> set_load_weight(p, false);
> -
> - p->latency_prio = NICE_TO_PRIO(0);
> + set_latency_prio(p, NICE_TO_PRIO(0));
>
> /*
> * We don't need the reset flag anymore after the fork. It has
> @@ -7507,7 +7513,7 @@ static void __setscheduler_latency(struc
> const struct sched_attr *attr)
> {
> if (attr->sched_flags & SCHED_FLAG_LATENCY_NICE)
> - p->latency_prio = NICE_TO_PRIO(attr->sched_latency_nice);
> + set_latency_prio(p, NICE_TO_PRIO(attr->sched_latency_nice));
> }
>
> /*
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -952,6 +952,21 @@ int sched_update_scaling(void)
> }
> #endif
>
> +void set_latency_fair(struct sched_entity *se, int prio)
> +{
> + u32 weight = sched_prio_to_weight[prio];
> + u64 base = sysctl_sched_base_slice;
> +
> + /*
> + * For EEVDF the virtual time slope is determined by w_i (iow.
> + * nice) while the request time r_i is determined by
> + * latency-nice.
> + *
> + * Smaller request gets better latency.
> + */
> + se->slice = div_u64(base << SCHED_FIXEDPOINT_SHIFT, weight);
> +}
> +
> static void clear_buddies(struct cfs_rq *cfs_rq, struct sched_entity *se);
>
> /*
> @@ -964,13 +979,6 @@ static void update_deadline(struct cfs_r
> return;
>
> /*
> - * For EEVDF the virtual time slope is determined by w_i (iow.
> - * nice) while the request time r_i is determined by
> - * sysctl_sched_base_slice.
> - */
> - se->slice = sysctl_sched_base_slice;
> -
> - /*
> * EEVDF: vd_i = ve_i + r_i / w_i
> */
> se->deadline = se->vruntime + calc_delta_fair(se->slice, se);
> --- a/kernel/sched/sched.h
> +++ b/kernel/sched/sched.h
> @@ -2495,6 +2495,8 @@ extern unsigned int sysctl_numa_balancin
> extern unsigned int sysctl_numa_balancing_hot_threshold;
> #endif
>
> +extern void set_latency_fair(struct sched_entity *se, int prio);
> +
> #ifdef CONFIG_SCHED_HRTICK
>
> /*
>
>
Powered by blists - more mailing lists