linux-kernel - Re: [RFC PATCH] sched: idle: Introduce CPU-specific idle=poll

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <7n6zmi3aaxrwfpvkzbugt3e3274zw3qb2kci4yyq2q6gojb3ku@zh3g4rvnyqzi>
Date: Mon, 23 Jun 2025 22:49:59 +0100
From: Mel Gorman <mgorman@...hsingularity.net>
To: Peter Zijlstra <peterz@...radead.org>
Cc: Aaron Tomlin <atomlin@...mlin.com>, tglx@...utronix.de, 
	mingo@...hat.com, bp@...en8.de, dave.hansen@...ux.intel.com, x86@...nel.org, 
	juri.lelli@...hat.com, vincent.guittot@...aro.org, hpa@...or.com, oleg@...hat.com, 
	dietmar.eggemann@....com, rostedt@...dmis.org, bsegall@...gle.com, mgorman@...e.de, 
	vschneid@...hat.com, linux-kernel@...r.kernel.org
Subject: Re: [RFC PATCH] sched: idle: Introduce CPU-specific idle=poll

On Mon, Jun 23, 2025 at 12:23:34PM +0200, Peter Zijlstra wrote:
> On Sat, Jun 21, 2025 at 07:57:45PM -0400, Aaron Tomlin wrote:
> > Currently, the idle=poll kernel boot parameter applies globally, forcing
> > all CPUs into a shallow polling idle state to ensure ultra-low latency
> > responsiveness. While this is beneficial for extremely latency-sensitive
> > workloads, this global application lacks flexibility and can lead to
> > significant power inefficiency. This is particularly evident in systems
> > with a high CPU count, such as those utilising the
> > Full Dynticks/Adaptive Tick feature (i.e., nohz_full). In such
> > environments, only a subset of CPUs might genuinely require
> > sub-microsecond responsiveness, while others, though active, could
> > benefit from entering deeper idle states to conserve power.
> 
> Can't we already do this at runtime with pmqos? If you set your latency
> demand very low, it should end up picking the poll state, no? And you
> can do this per-cpu.

Yes, we can. idle=poll can be hazardous in weird ways and it's not like
pmqos is hard to use. For example, lets say you had a RT application with
latency constraints running on isolated CPUs while leaving housekeeping
CPUs alone then it's simply a case of;

        for CPU in $ISOLATED_CPUS; do
                SYSFS_PARAM="/sys/devices/system/cpu/cpu$CPU/power/pm_qos_resume_latency_us"
                if [ ! -e $SYSFS_PARAM ]; then
                        echo "WARNING: Unable to set PM QOS max latency for CPU $CPU\n"
                        continue
                fi
                echo $MAX_EXIT_LATENCY > $SYSFS_PARAM
                echo "Set PM QOS maximum resume latency on CPU $CPU to ${MAX_EXIT_LATENCY}us"
        done
 

In too many cases I've seen idle=poll being used when the user didn't know
PM QOS existed. The most common response I've received is that the latency
requirements were unknown resulting in much headbanging off the table.
Don't get me started on the hazards of limiting c-states by index without
checking that the c-states are or splitting isolated/housekeeping across
SMT siblings.

-- 
Mel Gorman
SUSE Labs