[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ce8b4488-a165-4847-8f2b-e2ee65746e00@arm.com>
Date: Sun, 31 Aug 2025 22:30:05 +0100
From: Christian Loehle <christian.loehle@....com>
To: "Rafael J. Wysocki" <rafael@...nel.org>,
Linux PM <linux-pm@...r.kernel.org>
Cc: Frederic Weisbecker <frederic@...nel.org>,
LKML <linux-kernel@...r.kernel.org>, Peter Zijlstra <peterz@...radead.org>
Subject: Re: [PATCH v1] cpuidle: governors: teo: Special-case nohz_full CPUs
On 8/29/25 20:37, Rafael J. Wysocki wrote:
> On Thu, Aug 28, 2025 at 10:16 PM Rafael J. Wysocki <rafael@...nel.org> wrote:
>>
>> From: Rafael J. Wysocki <rafael.j.wysocki@...el.com>
>>
>> This change follows an analogous modification of the menu governor [1].
>>
>> Namely, when the governor runs on a nohz_full CPU and there are no user
>> space timers in the workload on that CPU, it ends up selecting idle
>> states with target residency values above TICK_NSEC, or the deepest
>> enabled idle state in the absence of any of those, all the time due to
>> a tick_nohz_tick_stopped() check designed for running on CPUs where the
>> tick is not permanently disabled. In that case, the fact that the tick
>> has been stopped means that the CPU was expected to be idle sufficiently
>> long previously, so it is not unreasonable to expect it to be idle
>> sufficiently long again, but this inference does not apply to nohz_full
>> CPUs.
>>
>> In some cases, latency in the workload grows undesirably as a result of
>> selecting overly deep idle states, and the workload may also consume
>> more energy than necessary if the CPU does not spend enough time in the
>> selected deep idle state.
>>
>> Address this by amending the tick_nohz_tick_stopped() check in question
>> with a tick_nohz_full_cpu() one to avoid effectively ignoring all
>> shallow idle states on nohz_full CPUs. While doing so introduces a risk
>> of getting stuck in a shallow idle state for a long time, that only
>> affects energy efficiently, but the current behavior potentially hurts
>> both energy efficiency and performance that is arguably the priority for
>> nohz_full CPUs.
>
> This change is likely to break the use case in which CPU isolation is
> used for power management reasons, to prevent CPUs from running any
> code and so to save energy.
>
> In that case, going into the deepest state every time on nohz_full
> CPUs is a feature, so it can't be changed unconditionally.
>
> For this reason, I'm not going to apply this patch and I'm going to
> drop the menu governor one below.
>
> The only way to allow everyone to do what they want/need I can see
> would be to add a control knob for adjusting the behavior of cpuidle
> governors regarding the handling of nohz_full CPUs.
But then what's the advantage instead of just using
/sys/devices/system/cpu/cpuX/power/latency
for the nohz_full CPUs (if you don't want the current 'over-eagerly
selecting deepest state on nohz_full')?
Powered by blists - more mailing lists