[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <7n6zmi3aaxrwfpvkzbugt3e3274zw3qb2kci4yyq2q6gojb3ku@zh3g4rvnyqzi>
Date: Mon, 23 Jun 2025 22:49:59 +0100
From: Mel Gorman <mgorman@...hsingularity.net>
To: Peter Zijlstra <peterz@...radead.org>
Cc: Aaron Tomlin <atomlin@...mlin.com>, tglx@...utronix.de,
mingo@...hat.com, bp@...en8.de, dave.hansen@...ux.intel.com, x86@...nel.org,
juri.lelli@...hat.com, vincent.guittot@...aro.org, hpa@...or.com, oleg@...hat.com,
dietmar.eggemann@....com, rostedt@...dmis.org, bsegall@...gle.com, mgorman@...e.de,
vschneid@...hat.com, linux-kernel@...r.kernel.org
Subject: Re: [RFC PATCH] sched: idle: Introduce CPU-specific idle=poll
On Mon, Jun 23, 2025 at 12:23:34PM +0200, Peter Zijlstra wrote:
> On Sat, Jun 21, 2025 at 07:57:45PM -0400, Aaron Tomlin wrote:
> > Currently, the idle=poll kernel boot parameter applies globally, forcing
> > all CPUs into a shallow polling idle state to ensure ultra-low latency
> > responsiveness. While this is beneficial for extremely latency-sensitive
> > workloads, this global application lacks flexibility and can lead to
> > significant power inefficiency. This is particularly evident in systems
> > with a high CPU count, such as those utilising the
> > Full Dynticks/Adaptive Tick feature (i.e., nohz_full). In such
> > environments, only a subset of CPUs might genuinely require
> > sub-microsecond responsiveness, while others, though active, could
> > benefit from entering deeper idle states to conserve power.
>
> Can't we already do this at runtime with pmqos? If you set your latency
> demand very low, it should end up picking the poll state, no? And you
> can do this per-cpu.
Yes, we can. idle=poll can be hazardous in weird ways and it's not like
pmqos is hard to use. For example, lets say you had a RT application with
latency constraints running on isolated CPUs while leaving housekeeping
CPUs alone then it's simply a case of;
for CPU in $ISOLATED_CPUS; do
SYSFS_PARAM="/sys/devices/system/cpu/cpu$CPU/power/pm_qos_resume_latency_us"
if [ ! -e $SYSFS_PARAM ]; then
echo "WARNING: Unable to set PM QOS max latency for CPU $CPU\n"
continue
fi
echo $MAX_EXIT_LATENCY > $SYSFS_PARAM
echo "Set PM QOS maximum resume latency on CPU $CPU to ${MAX_EXIT_LATENCY}us"
done
In too many cases I've seen idle=poll being used when the user didn't know
PM QOS existed. The most common response I've received is that the latency
requirements were unknown resulting in much headbanging off the table.
Don't get me started on the hazards of limiting c-states by index without
checking that the c-states are or splitting isolated/housekeeping across
SMT siblings.
--
Mel Gorman
SUSE Labs
Powered by blists - more mailing lists