linux-kernel - Re: [PATCH 4/6] sched/isolation: Residual 1Hz scheduler tick offload

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Fri, 9 Feb 2018 08:16:12 +0100
From:   Ingo Molnar <mingo@...nel.org>
To:     Frederic Weisbecker <frederic@...nel.org>
Cc:     LKML <linux-kernel@...r.kernel.org>,
        Peter Zijlstra <peterz@...radead.org>,
        Chris Metcalf <cmetcalf@...lanox.com>,
        Thomas Gleixner <tglx@...utronix.de>,
        Luiz Capitulino <lcapitulino@...hat.com>,
        Christoph Lameter <cl@...ux.com>,
        "Paul E . McKenney" <paulmck@...ux.vnet.ibm.com>,
        Wanpeng Li <kernellwp@...il.com>,
        Mike Galbraith <efault@....de>, Rik van Riel <riel@...hat.com>
Subject: Re: [PATCH 4/6] sched/isolation: Residual 1Hz scheduler tick offload


* Frederic Weisbecker <frederic@...nel.org> wrote:

> When a CPU runs in full dynticks mode, a 1Hz tick remains in order to
> keep the scheduler stats alive. However this residual tick is a burden
> for bare metal tasks that can't stand any interruption at all, or want
> to minimize them.
> 
> The usual boot parameters "nohz_full=" or "isolcpus=nohz" will now
> outsource these scheduler ticks to the global workqueue so that a
> housekeeping CPU handles those remotely. The sched_class::task_tick()
> implementations have been audited and look safe to be called remotely
> as the target runqueue and its current task are passed in parameter
> and don't seem to be accessed locally.
> 
> Note that in the case of using isolcpus, it's still up to the user to
> affine the global workqueues to the housekeeping CPUs through
> /sys/devices/virtual/workqueue/cpumask or domains isolation
> "isolcpus=nohz,domain".
> 
> Signed-off-by: Frederic Weisbecker <frederic@...nel.org>
> Cc: Chris Metcalf <cmetcalf@...lanox.com>
> Cc: Christoph Lameter <cl@...ux.com>
> Cc: Luiz Capitulino <lcapitulino@...hat.com>
> Cc: Mike Galbraith <efault@....de>
> Cc: Paul E. McKenney <paulmck@...ux.vnet.ibm.com>
> Cc: Peter Zijlstra <peterz@...radead.org>
> Cc: Rik van Riel <riel@...hat.com>
> Cc: Thomas Gleixner <tglx@...utronix.de>
> Cc: Wanpeng Li <kernellwp@...il.com>
> Cc: Ingo Molnar <mingo@...nel.org>
> ---
>  kernel/sched/core.c      | 91 +++++++++++++++++++++++++++++++++++++++++++++++-
>  kernel/sched/isolation.c |  4 +++
>  kernel/sched/sched.h     |  2 ++
>  3 files changed, 96 insertions(+), 1 deletion(-)
> 
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index fc9fa25..5c0e8b6 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -3120,7 +3120,94 @@ u64 scheduler_tick_max_deferment(void)
>  
>  	return jiffies_to_nsecs(next - now);
>  }
> -#endif
> +
> +struct tick_work {
> +	int			cpu;
> +	struct delayed_work	work;
> +};
> +
> +static struct tick_work __percpu *tick_work_cpu;
> +
> +static void sched_tick_remote(struct work_struct *work)
> +{
> +	struct delayed_work *dwork = to_delayed_work(work);
> +	struct tick_work *twork = container_of(dwork, struct tick_work, work);
> +	int cpu = twork->cpu;
> +	struct rq *rq = cpu_rq(cpu);
> +	struct rq_flags rf;
> +
> +	/*
> +	 * Handle the tick only if it appears the remote CPU is running
> +	 * in full dynticks mode. The check is racy by nature, but
> +	 * missing a tick or having one too much is no big deal.

I'd suggest pointing out why it's no big deal:

	 * missing a tick or having one too much is no big deal,
         * because the scheduler tick updates statistics and checks
	 * timeslices in a time-independent way, regardless of when
	 * exactly it is running.

> +	 */
> +	if (!idle_cpu(cpu) && tick_nohz_tick_stopped_cpu(cpu)) {
> +		struct task_struct *curr;
> +		u64 delta;
> +
> +		rq_lock_irq(rq, &rf);
> +		update_rq_clock(rq);
> +		curr = rq->curr;
> +		delta = rq_clock_task(rq) - curr->se.exec_start;
> +		/* Make sure we tick in a reasonable amount of time */
> +		WARN_ON_ONCE(delta > (u64)NSEC_PER_SEC * 3);


Please add a newline before the comment, and I'd also suggest this wording:

		/* Make sure the next tick runs within a reasonable amount of time: */

> +	/*
> +	 * Perform remote tick every second. The arbitrary frequence is
> +	 * large enough to avoid overload and short enough to keep sched
> +	 * internal stats alive.
> +	 */
> +	queue_delayed_work(system_unbound_wq, dwork, HZ);
> +}

Typo. I'd also suggest somewhat clearer wording:

	/*
	 * Run the remote tick once per second (1Hz). This arbitrary
	 * frequency is large enough to avoid overload but short enough
	 * to keep scheduler internal stats reasonably up to date.
	 */

> +#ifdef CONFIG_HOTPLUG_CPU
> +static void sched_tick_stop(int cpu)
> +{
> +	struct tick_work *twork;
> +
> +	if (housekeeping_cpu(cpu, HK_FLAG_TICK))
> +		return;
> +
> +	WARN_ON_ONCE(!tick_work_cpu);
> +
> +	twork = per_cpu_ptr(tick_work_cpu, cpu);
> +	cancel_delayed_work_sync(&twork->work);
> +}
> +#endif /* CONFIG_HOTPLUG_CPU */
> +
> +int __init sched_tick_offload_init(void)
> +{
> +	tick_work_cpu = alloc_percpu(struct tick_work);
> +	if (!tick_work_cpu) {
> +		pr_err("Can't allocate remote tick struct\n");
> +		return -ENOMEM;

Printing a warning is not enough. If tick_work_cpu ends up being NULL, then the 
tick will crash AFAICS, due to:

  > +	twork = per_cpu_ptr(tick_work_cpu, cpu);
  > +	cancel_delayed_work_sync(&twork->work);

... it's much better to crash straight away - i.e. we should use panic().

> +#else
> +static void sched_tick_start(int cpu) { }
> +static void sched_tick_stop(int cpu) { }
> +#endif /* CONFIG_NO_HZ_FULL */

So if we are using #if/else/endif markers, please use them in the #else branch 
when it's so short, where they are actually useful:

> +#else /* !CONFIG_NO_HZ_FULL: */
> +static void sched_tick_start(int cpu) { }
> +static void sched_tick_stop(int cpu) { }
> +#endif

(also note the inversion)

Thanks,

	Ingo