[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20200422131138.GL17661@paulmck-ThinkPad-P72>
Date: Wed, 22 Apr 2020 06:11:38 -0700
From: "Paul E. McKenney" <paulmck@...nel.org>
To: Peter Zijlstra <peterz@...radead.org>
Cc: mingo@...nel.org, linux-kernel@...r.kernel.org, tglx@...utronix.de,
rostedt@...dmis.org, qais.yousef@....com, juri.lelli@...hat.com,
vincent.guittot@...aro.org, dietmar.eggemann@....com,
bsegall@...gle.com, mgorman@...e.de, airlied@...hat.com,
alexander.deucher@....com, awalls@...metrocast.net,
axboe@...nel.dk, broonie@...nel.org, daniel.lezcano@...aro.org,
gregkh@...uxfoundation.org, hannes@...xchg.org,
herbert@...dor.apana.org.au, hverkuil@...all.nl,
john.stultz@...aro.org, nico@...xnic.net,
rafael.j.wysocki@...el.com, rmk+kernel@....linux.org.uk,
sudeep.holla@....com, ulf.hansson@...aro.org,
wim@...ux-watchdog.org
Subject: Re: [PATCH 01/23] sched: Provide sched_set_fifo()
On Wed, Apr 22, 2020 at 01:27:20PM +0200, Peter Zijlstra wrote:
> SCHED_FIFO (or any static priority scheduler) is a broken scheduler
> model; it is fundamentally incapable of resource management, the one
> thing an OS is actually supposed to do.
>
> It is impossible to compose static priority workloads. One cannot take
> two well designed and functional static priority workloads and mash
> them together and still expect them to work.
>
> Therefore it doesn't make sense to expose the priority field; the
> kernel is fundamentally incapable of setting a sensible value, it
> needs systems knowledge that it doesn't have.
>
> Take away sched_setschedule() / sched_setattr() from modules and
> replace them with:
>
> - sched_set_fifo(p); create a FIFO task (at prio 50)
> - sched_set_fifo_low(p); create a task higher than NORMAL,
> which ends up being a FIFO task at prio 1.
> - sched_set_normal(p, nice); (re)set the task to normal
>
> This stops the proliferation of randomly chosen, and irrelevant, FIFO
> priorities that dont't really mean anything anyway.
>
> The system administrator/integrator, whoever has insight into the
> actual system design and requirements (userspace) can set-up
> appropriate priorities if and when needed.
The sched_setscheduler_nocheck() calls in rcu_spawn_gp_kthread(),
rcu_cpu_kthread_setup(), and rcu_spawn_one_boost_kthread() all stay as
is because they all use the rcutree.kthread_prio boot parameter, which is
set at boot time by the system administrator (or {who,what}ever, correct?
Or did my email reader eat a patch or two?
Thanx, Paul
> Cc: airlied@...hat.com
> Cc: alexander.deucher@....com
> Cc: awalls@...metrocast.net
> Cc: axboe@...nel.dk
> Cc: broonie@...nel.org
> Cc: daniel.lezcano@...aro.org
> Cc: gregkh@...uxfoundation.org
> Cc: hannes@...xchg.org
> Cc: herbert@...dor.apana.org.au
> Cc: hverkuil@...all.nl
> Cc: john.stultz@...aro.org
> Cc: nico@...xnic.net
> Cc: paulmck@...nel.org
> Cc: rafael.j.wysocki@...el.com
> Cc: rmk+kernel@....linux.org.uk
> Cc: sudeep.holla@....com
> Cc: tglx@...utronix.de
> Cc: ulf.hansson@...aro.org
> Cc: wim@...ux-watchdog.org
> Signed-off-by: Peter Zijlstra (Intel) <peterz@...radead.org>
> Reviewed-by: Ingo Molnar <mingo@...nel.org>
> ---
> include/linux/sched.h | 3 +++
> kernel/sched/core.c | 47 +++++++++++++++++++++++++++++++++++++++++++++++
> 2 files changed, 50 insertions(+)
>
> --- a/include/linux/sched.h
> +++ b/include/linux/sched.h
> @@ -1631,6 +1631,9 @@ extern int idle_cpu(int cpu);
> extern int available_idle_cpu(int cpu);
> extern int sched_setscheduler(struct task_struct *, int, const struct sched_param *);
> extern int sched_setscheduler_nocheck(struct task_struct *, int, const struct sched_param *);
> +extern int sched_set_fifo(struct task_struct *p);
> +extern int sched_set_fifo_low(struct task_struct *p);
> +extern int sched_set_normal(struct task_struct *p, int nice);
> extern int sched_setattr(struct task_struct *, const struct sched_attr *);
> extern int sched_setattr_nocheck(struct task_struct *, const struct sched_attr *);
> extern struct task_struct *idle_task(int cpu);
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -5055,6 +5055,8 @@ static int _sched_setscheduler(struct ta
> * @policy: new policy.
> * @param: structure containing the new RT priority.
> *
> + * Use sched_set_fifo(), read its comment.
> + *
> * Return: 0 on success. An error code otherwise.
> *
> * NOTE that the task may be already dead.
> @@ -5097,6 +5099,51 @@ int sched_setscheduler_nocheck(struct ta
> }
> EXPORT_SYMBOL_GPL(sched_setscheduler_nocheck);
>
> +/*
> + * SCHED_FIFO is a broken scheduler model; that is, it is fundamentally
> + * incapable of resource management, which is the one thing an OS really should
> + * be doing.
> + *
> + * This is of course the reason it is limited to privileged users only.
> + *
> + * Worse still; it is fundamentally impossible to compose static priority
> + * workloads. You cannot take two correctly working static prio workloads
> + * and smash them together and still expect them to work.
> + *
> + * For this reason 'all' FIFO tasks the kernel creates are basically at:
> + *
> + * MAX_RT_PRIO / 2
> + *
> + * The administrator _MUST_ configure the system, the kernel simply doesn't
> + * know enough information to make a sensible choice.
> + */
> +int sched_set_fifo(struct task_struct *p)
> +{
> + struct sched_param sp = { .sched_priority = MAX_RT_PRIO / 2 };
> + return sched_setscheduler_nocheck(p, SCHED_FIFO, &sp);
> +}
> +EXPORT_SYMBOL_GPL(sched_set_fifo);
> +
> +/*
> + * For when you don't much care about FIFO, but want to be above SCHED_NORMAL.
> + */
> +int sched_set_fifo_low(struct task_struct *p)
> +{
> + struct sched_param sp = { .sched_priority = 1 };
> + return sched_setscheduler_nocheck(p, SCHED_FIFO, &sp);
> +}
> +EXPORT_SYMBOL_GPL(sched_set_fifo_low);
> +
> +int sched_set_normal(struct task_struct *p, int nice)
> +{
> + struct sched_attr attr = {
> + .sched_policy = SCHED_NORMAL,
> + .sched_nice = nice,
> + };
> + return sched_setattr_nocheck(p, &attr);
> +}
> +EXPORT_SYMBOL_GPL(sched_set_normal);
> +
> static int
> do_sched_setscheduler(pid_t pid, int policy, struct sched_param __user *param)
> {
>
>
Powered by blists - more mailing lists