linux-kernel - Re: [PATCH 0/1] cpuidle: menu: Fix high wakeup latency on modern Intel server platforms

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <eac3541b-f22f-4cd9-a31e-4841e4fad5a1@arm.com>
Date: Wed, 21 Jan 2026 11:49:19 +0000
From: Christian Loehle <christian.loehle@....com>
To: "Ionut Nechita (Sunlight Linux)" <sunlightlinux@...il.com>,
 rafael@...nel.org
Cc: ionut_n2001@...oo.com, daniel.lezcano@...aro.org,
 linux-pm@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH 0/1] cpuidle: menu: Fix high wakeup latency on modern
 Intel server platforms

On 1/20/26 21:17, Ionut Nechita (Sunlight Linux) wrote:
> From: Ionut Nechita <ionut_n2001@...oo.com>
> 
> Hi,

Hi Ionut,

> 
> This patch addresses a performance regression in the menu cpuidle governor
> affecting modern Intel server platforms (Sapphire Rapids, Granite Rapids,
> and newer).

I'll take a look at the patch later, but just to be clear, this isn't a
performance regression right? There's no kernel version that this behaved
better with, is there?
If there is it needs to be stated and maybe a Fixes tag would be applicable.

> 
> == Problem Description ==
> 
> On Intel server platforms from 2022 onwards, we observe excessive wakeup
> latencies (~150us) in network-sensitive workloads when using the menu
> governor with NOHZ_FULL enabled.
> 
> Measurement with qperf tcp_lat shows:
> - Sapphire Rapids (SPR):    151us latency
> - Ice Lake (ICL):             12us latency
> - Skylake (SKL):              21us latency
> 
> The 12x latency regression on SPR compared to Ice Lake is unacceptable for
> latency-sensitive applications (HPC, real-time, financial trading, etc.).

So just newer generation having higher latency.
TBF the examples you mentioned should really have their latencies in control
themselves and not rely on menu guesstimating what's needed here.

> 
> == Root Cause ==
> 
> The issue stems from menu.c:294-295:
> 
>     if (tick_nohz_tick_stopped() && predicted_ns < TICK_NSEC)
>         predicted_ns = data->next_timer_ns;
> 
> When the tick is already stopped and the predicted idle duration is short
> (<2ms), the governor switches to using next_timer_ns directly (often
> 10ms+). This causes the selection of very deep package C-states (PC6).
> 
> Modern server platforms have significantly longer C-state exit latencies
> due to architectural changes:
> - Tile-based architecture with per-tile power gating
> - DDR5 power management overhead
> - CXL link restoration
> - Complex mesh interconnect resynchronization
> 
> When a network packet arrives after 500us but the governor selected PC6
> based on a 10ms timer, the 150us exit latency dominates the response time.
> 
> On older platforms (Ice Lake, Skylake) with faster C-state transitions
> (12-21us), this issue was less noticeable, but SPR's tile architecture
> makes it critical.
> [snip]

Can you provide idle state tables with residencies and usage?
Ideally idle misses for both as well?
Thanks!