[<prev] [next>] [day] [month] [year] [list]
Message-ID: <20260118231009.3194039-1-atomlin@atomlin.com>
Date: Sun, 18 Jan 2026 18:10:09 -0500
From: Aaron Tomlin <atomlin@...mlin.com>
To: corbet@....net,
anna-maria@...utronix.de,
frederic@...nel.org,
tglx@...nel.org,
mingo@...hat.com,
bp@...en8.de,
dave.hansen@...ux.intel.com
Cc: x86@...nel.org,
hpa@...or.com,
akpm@...ux-foundation.org,
pawan.kumar.gupta@...ux.intel.com,
feng.tang@...ux.alibaba.com,
kees@...nel.org,
elver@...gle.com,
arnd@...db.de,
fvdl@...gle.com,
lirongqing@...du.com,
bhelgaas@...gle.com,
peterz@...radead.org,
brgerst@...il.com,
kai.huang@...el.com,
benjamin.berg@...el.com,
andrew.cooper3@...rix.com,
oleg@...hat.com,
neelx@...e.com,
atomlin@...mlin.com,
sean@...e.io,
mproche@...il.com,
chjohnst@...il.com,
linux-doc@...r.kernel.org,
linux-kernel@...r.kernel.org
Subject: [PATCH] x86/idle: Mark "idle=poll" as deprecated
The "idle=poll" boot parameter is a blunt instrument that forces all
CPUs in the system into a continuous "polling" state. While effective
at eliminating wake-up latency, this global override is architecturally
obsolete and inefficient on modern multicore systems.
It suffers from several significant drawbacks:
1. Lack of Granularity: It prevents "housekeeping" CPUs from
entering power-saving states, leading to unnecessary energy
waste and thermal throttling that can negatively impact the very
latency-sensitive tasks it aims to protect
2. Resource Contention: On SMT systems, a polling sibling thread
actively consumes execution resources, potentially degrading the
performance of the primary thread on the same physical core.
The Power Management Quality of Service (PM QoS) subsystem now provides
a superior, granular alternative.
By writing special value "n/a" to the per-CPU sysfs node
/sys/devices/system/cpu/cpuN/power/pm_qos_resume_latency_us, userspace
can force a specific CPU to poll without imposing this cost globally.
Writing "0" to the same file removes the constraint, allowing the
governor to freely select the deepest applicable C-state.
This patch marks "idle=poll" as deprecated. A warning is issued at
boot time if the parameter is used, guiding users toward the PM QoS
interface.
Signed-off-by: Aaron Tomlin <atomlin@...mlin.com>
---
Documentation/admin-guide/kernel-parameters.txt | 9 +++++----
Documentation/timers/no_hz.rst | 9 +++++++++
arch/x86/kernel/process.c | 1 +
3 files changed, 15 insertions(+), 4 deletions(-)
diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index 1058f2a6d6a8..6a3d6bd0746c 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -2242,10 +2242,11 @@ Kernel parameters
idle= [X86,EARLY]
Format: idle=poll, idle=halt, idle=nomwait
- idle=poll: Don't do power saving in the idle loop
- using HLT, but poll for rescheduling event. This will
- make the CPUs eat a lot more power, but may be useful
- to get slightly better performance in multiprocessor
+ idle=poll: [Deprecated - use PM QoS]
+ Don't do power saving in the idle loop using HLT, but
+ poll for rescheduling event. This will make the CPUs
+ eat a lot more power, but may be useful to get
+ slightly better performance in multiprocessor
benchmarks. It also makes some profiling using
performance counters more accurate. Please note that
on systems with MONITOR/MWAIT support (like Intel
diff --git a/Documentation/timers/no_hz.rst b/Documentation/timers/no_hz.rst
index 7fe8ef9718d8..ca7918cc169e 100644
--- a/Documentation/timers/no_hz.rst
+++ b/Documentation/timers/no_hz.rst
@@ -234,6 +234,15 @@ Known Issues
a. Use PMQOS from userspace to inform the kernel of your
latency requirements (preferred).
+ This interface offers a superior and more flexible alternative to
+ global boot parameters such as "idle=poll", as it can be adjusted
+ at runtime with per-CPU granularity.
+
+ To force a specific CPU (where N is the logical CPU number) to poll
+ on idle, one can set the latency requirement to 0 microseconds:
+
+ echo "n/a" > /sys/devices/system/cpu/cpuN/power/pm_qos_resume_latency_us
+
b. On x86 systems, use the "idle=mwait" boot parameter.
c. On x86 systems, use the "intel_idle.max_cstate=" to limit
diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
index 4c718f8adc59..359b57f6272b 100644
--- a/arch/x86/kernel/process.c
+++ b/arch/x86/kernel/process.c
@@ -1000,6 +1000,7 @@ static int __init idle_setup(char *str)
if (!strcmp(str, "poll")) {
pr_info("using polling idle threads\n");
+ pr_warn("idle=poll is deprecated. Use the PM QoS interface instead via /sys/devices/system/cpu/cpuN/power/\n");
boot_option_idle_override = IDLE_POLL;
cpu_idle_poll_ctrl(true);
} else if (!strcmp(str, "halt")) {
--
2.51.0
Powered by blists - more mailing lists