[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <Y8PEM9TK4vTPWuxH@gmail.com>
Date: Sun, 15 Jan 2023 10:15:31 +0100
From: Ingo Molnar <mingo@...nel.org>
To: Daniel Bristot de Oliveira <bristot@...nel.org>
Cc: linux-kernel@...r.kernel.org, Ingo Molnar <mingo@...hat.com>,
Peter Zijlstra <peterz@...radead.org>,
Juri Lelli <juri.lelli@...hat.com>,
Vincent Guittot <vincent.guittot@...aro.org>,
Dietmar Eggemann <dietmar.eggemann@....com>,
Steven Rostedt <rostedt@...dmis.org>,
Ben Segall <bsegall@...gle.com>, Mel Gorman <mgorman@...e.de>,
Daniel Bristot de Oliveira <bristot@...hat.com>,
Valentin Schneider <vschneid@...hat.com>,
Joe Mario <jmario@...hat.com>
Subject: Re: [PATCH] sched/idle: Make idle poll dynamic per-cpu
* Daniel Bristot de Oliveira <bristot@...nel.org> wrote:
> idle=poll is frequently used on ultra-low-latency systems. Examples of
> such systems are high-performance trading and 5G NVRAM. The performance
> gain is given by avoiding the idle driver machinery and by keeping the
> CPU is always in an active state - avoiding (odd) hardware heuristics that
> are out of the control of the OS.
>
> Currently, idle=poll is an all-or-nothing static option defined at
> boot time. The motivation for creating this option dynamic and per-cpu
> are two:
>
> 1) Reduce the power usage/heat by allowing only selected CPUs to
> do idle polling;
> 2) Allow multi-tenant systems (e.g., Kubernetes) to enable idle
> polling only when ultra-low-latency applications are present
> on specific CPUs.
>
> Joe Mario did some experiments with this option enabled, and the results
> were significant. For example, by using dynamic idle polling on
> selected CPUs, cyclictest performance is optimal (like when using
> idle=poll), but cpu power consumption drops from 381 to 233 watts.
>
> Also, limiting idle=poll to the set of CPUs that benefits from
> it allows other CPUs to benefit from frequency boosts. Joe also
> shows that the results can be in the order of 80nsec round trip
> improvement when system-wide idle=poll was not used.
>
> The user can enable idle polling with this command:
> # echo 1 > /sys/devices/system/cpu/cpu{CPU_ID}/idle_poll
>
> And disable it via:
> # echo 0 > /sys/devices/system/cpu/cpu{CPU_ID}/idle_poll
>
> By default, all CPUs have idle polling disabled (the current behavior).
> A static key avoids the CPU mask check overhead when no idle polling
> is enabled.
Sounds useful in general.
A couple of observations:
ABI: how about putting the new file into the existing
/sys/devices/system/cpu/cpuidle/ directory - the sysfs space of cpuidle?
Arguably this flag is an extension of it.
> extern char __cpuidle_text_start[], __cpuidle_text_end[];
>
> +/*
> + * per-cpu idle polling selector.
> + */
> +static struct cpumask cpu_poll_mask;
> +DEFINE_STATIC_KEY_FALSE(cpu_poll_enabled);
> +
> +/*
> + * Protects the mask/static key relation.
> + */
> +DEFINE_MUTEX(cpu_poll_mutex);
> +
> +static ssize_t idle_poll_store(struct device *dev, struct device_attribute *attr,
> + const char *buf, size_t count)
> +{
> + int cpu = dev->id;
> + int retval, set;
> + bool val;
> +
> + retval = kstrtobool(buf, &val);
> + if (retval)
> + return retval;
> +
> + mutex_lock(&cpu_poll_mutex);
> +
> + if (val) {
> + set = cpumask_test_and_set_cpu(cpu, &cpu_poll_mask);
> +
> + /*
> + * If the CPU was already on, do not increase the static key usage.
> + */
> + if (!set)
> + static_branch_inc(&cpu_poll_enabled);
> + } else {
> + set = cpumask_test_and_clear_cpu(cpu, &cpu_poll_mask);
> +
> + /*
> + * If the CPU was already off, do not decrease the static key usage.
> + */
> + if (set)
> + static_branch_dec(&cpu_poll_enabled);
> + }
Nit: I think 'old_bit' or so is easier to read than a generic 'set'?
> +
> + mutex_unlock(&cpu_poll_mutex);
Also, is cpu_poll_mutex locking really necessary, given that these bitops
methods are atomic, and CPUs observe cpu_poll_enabled without taking any
locks?
> +static int is_cpu_idle_poll(int cpu)
> +{
> + if (static_branch_unlikely(&cpu_poll_enabled))
> + return cpumask_test_cpu(cpu, &cpu_poll_mask);
> +
> + return 0;
> +}
static inline might be justified in this case I guess.
> @@ -51,18 +136,21 @@ __setup("hlt", cpu_idle_nopoll_setup);
>
> static noinline int __cpuidle cpu_idle_poll(void)
> {
> - trace_cpu_idle(0, smp_processor_id());
> + int cpu = smp_processor_id();
> +
> + trace_cpu_idle(0, cpu);
> stop_critical_timings();
> ct_idle_enter();
> local_irq_enable();
>
> while (!tif_need_resched() &&
> - (cpu_idle_force_poll || tick_check_broadcast_expired()))
> + (cpu_idle_force_poll || tick_check_broadcast_expired()
> + || is_cpu_idle_poll(cpu)))
> cpu_relax();
>
> ct_idle_exit();
> start_critical_timings();
> - trace_cpu_idle(PWR_EVENT_EXIT, smp_processor_id());
> + trace_cpu_idle(PWR_EVENT_EXIT, cpu);
>
> return 1;
So I think the introduction of the 'cpu' local variable to clean up the
flow of cpu_idle_poll() should be a separate preparatory patch, which will
make the addition of the is_cpu_idle_poll() call a bit easier to read in
the second patch.
> }
> @@ -296,7 +384,8 @@ static void do_idle(void)
> * broadcast device expired for us, we don't want to go deep
> * idle as we know that the IPI is going to arrive right away.
> */
> - if (cpu_idle_force_poll || tick_check_broadcast_expired()) {
> + if (cpu_idle_force_poll || tick_check_broadcast_expired()
> + || is_cpu_idle_poll(cpu)) {
> tick_nohz_idle_restart_tick();
> cpu_idle_poll();
Shouldn't we check is_cpu_idle_poll() right after the cpu_idle_force_poll
check, and before the tick_check_broadcast_expired() check?
Shouldn't matter to the outcome, but for consistency's sake.
Plus, if we are doing this anyway, maybe cpu_idle_force_poll could now be
implemented as 0/all setting of cpu_poll_mask, eliminating the
cpu_idle_force_poll flag? As a third patch on top.
Thanks,
Ingo
Powered by blists - more mailing lists