linux-kernel - Re: [PATCH 01/23] sched: Provide sched_set

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20200422131138.GL17661@paulmck-ThinkPad-P72>
Date:   Wed, 22 Apr 2020 06:11:38 -0700
From:   "Paul E. McKenney" <paulmck@...nel.org>
To:     Peter Zijlstra <peterz@...radead.org>
Cc:     mingo@...nel.org, linux-kernel@...r.kernel.org, tglx@...utronix.de,
        rostedt@...dmis.org, qais.yousef@....com, juri.lelli@...hat.com,
        vincent.guittot@...aro.org, dietmar.eggemann@....com,
        bsegall@...gle.com, mgorman@...e.de, airlied@...hat.com,
        alexander.deucher@....com, awalls@...metrocast.net,
        axboe@...nel.dk, broonie@...nel.org, daniel.lezcano@...aro.org,
        gregkh@...uxfoundation.org, hannes@...xchg.org,
        herbert@...dor.apana.org.au, hverkuil@...all.nl,
        john.stultz@...aro.org, nico@...xnic.net,
        rafael.j.wysocki@...el.com, rmk+kernel@....linux.org.uk,
        sudeep.holla@....com, ulf.hansson@...aro.org,
        wim@...ux-watchdog.org
Subject: Re: [PATCH 01/23] sched: Provide sched_set_fifo()

On Wed, Apr 22, 2020 at 01:27:20PM +0200, Peter Zijlstra wrote:
> SCHED_FIFO (or any static priority scheduler) is a broken scheduler
> model; it is fundamentally incapable of resource management, the one
> thing an OS is actually supposed to do.
> 
> It is impossible to compose static priority workloads. One cannot take
> two well designed and functional static priority workloads and mash
> them together and still expect them to work.
> 
> Therefore it doesn't make sense to expose the priority field; the
> kernel is fundamentally incapable of setting a sensible value, it
> needs systems knowledge that it doesn't have.
> 
> Take away sched_setschedule() / sched_setattr() from modules and
> replace them with:
> 
>   - sched_set_fifo(p); create a FIFO task (at prio 50)
>   - sched_set_fifo_low(p); create a task higher than NORMAL,
> 	which ends up being a FIFO task at prio 1.
>   - sched_set_normal(p, nice); (re)set the task to normal
> 
> This stops the proliferation of randomly chosen, and irrelevant, FIFO
> priorities that dont't really mean anything anyway.
> 
> The system administrator/integrator, whoever has insight into the
> actual system design and requirements (userspace) can set-up
> appropriate priorities if and when needed.

The sched_setscheduler_nocheck() calls in rcu_spawn_gp_kthread(),
rcu_cpu_kthread_setup(), and rcu_spawn_one_boost_kthread() all stay as
is because they all use the rcutree.kthread_prio boot parameter, which is
set at boot time by the system administrator (or {who,what}ever, correct?

Or did my email reader eat a patch or two?

							Thanx, Paul

> Cc: airlied@...hat.com
> Cc: alexander.deucher@....com
> Cc: awalls@...metrocast.net
> Cc: axboe@...nel.dk
> Cc: broonie@...nel.org
> Cc: daniel.lezcano@...aro.org
> Cc: gregkh@...uxfoundation.org
> Cc: hannes@...xchg.org
> Cc: herbert@...dor.apana.org.au
> Cc: hverkuil@...all.nl
> Cc: john.stultz@...aro.org
> Cc: nico@...xnic.net
> Cc: paulmck@...nel.org
> Cc: rafael.j.wysocki@...el.com
> Cc: rmk+kernel@....linux.org.uk
> Cc: sudeep.holla@....com
> Cc: tglx@...utronix.de
> Cc: ulf.hansson@...aro.org
> Cc: wim@...ux-watchdog.org
> Signed-off-by: Peter Zijlstra (Intel) <peterz@...radead.org>
> Reviewed-by: Ingo Molnar <mingo@...nel.org>
> ---
>  include/linux/sched.h |    3 +++
>  kernel/sched/core.c   |   47 +++++++++++++++++++++++++++++++++++++++++++++++
>  2 files changed, 50 insertions(+)
> 
> --- a/include/linux/sched.h
> +++ b/include/linux/sched.h
> @@ -1631,6 +1631,9 @@ extern int idle_cpu(int cpu);
>  extern int available_idle_cpu(int cpu);
>  extern int sched_setscheduler(struct task_struct *, int, const struct sched_param *);
>  extern int sched_setscheduler_nocheck(struct task_struct *, int, const struct sched_param *);
> +extern int sched_set_fifo(struct task_struct *p);
> +extern int sched_set_fifo_low(struct task_struct *p);
> +extern int sched_set_normal(struct task_struct *p, int nice);
>  extern int sched_setattr(struct task_struct *, const struct sched_attr *);
>  extern int sched_setattr_nocheck(struct task_struct *, const struct sched_attr *);
>  extern struct task_struct *idle_task(int cpu);
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -5055,6 +5055,8 @@ static int _sched_setscheduler(struct ta
>   * @policy: new policy.
>   * @param: structure containing the new RT priority.
>   *
> + * Use sched_set_fifo(), read its comment.
> + *
>   * Return: 0 on success. An error code otherwise.
>   *
>   * NOTE that the task may be already dead.
> @@ -5097,6 +5099,51 @@ int sched_setscheduler_nocheck(struct ta
>  }
>  EXPORT_SYMBOL_GPL(sched_setscheduler_nocheck);
>  
> +/*
> + * SCHED_FIFO is a broken scheduler model; that is, it is fundamentally
> + * incapable of resource management, which is the one thing an OS really should
> + * be doing.
> + *
> + * This is of course the reason it is limited to privileged users only.
> + *
> + * Worse still; it is fundamentally impossible to compose static priority
> + * workloads. You cannot take two correctly working static prio workloads
> + * and smash them together and still expect them to work.
> + *
> + * For this reason 'all' FIFO tasks the kernel creates are basically at:
> + *
> + *   MAX_RT_PRIO / 2
> + *
> + * The administrator _MUST_ configure the system, the kernel simply doesn't
> + * know enough information to make a sensible choice.
> + */
> +int sched_set_fifo(struct task_struct *p)
> +{
> +	struct sched_param sp = { .sched_priority = MAX_RT_PRIO / 2 };
> +	return sched_setscheduler_nocheck(p, SCHED_FIFO, &sp);
> +}
> +EXPORT_SYMBOL_GPL(sched_set_fifo);
> +
> +/*
> + * For when you don't much care about FIFO, but want to be above SCHED_NORMAL.
> + */
> +int sched_set_fifo_low(struct task_struct *p)
> +{
> +	struct sched_param sp = { .sched_priority = 1 };
> +	return sched_setscheduler_nocheck(p, SCHED_FIFO, &sp);
> +}
> +EXPORT_SYMBOL_GPL(sched_set_fifo_low);
> +
> +int sched_set_normal(struct task_struct *p, int nice)
> +{
> +	struct sched_attr attr = {
> +		.sched_policy = SCHED_NORMAL,
> +		.sched_nice = nice,
> +	};
> +	return sched_setattr_nocheck(p, &attr);
> +}
> +EXPORT_SYMBOL_GPL(sched_set_normal);
> +
>  static int
>  do_sched_setscheduler(pid_t pid, int policy, struct sched_param __user *param)
>  {
> 
>