[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20260126201943.11505-2-sunlightlinux@gmail.com>
Date: Mon, 26 Jan 2026 22:19:44 +0200
From: "Ionut Nechita (Sunlight Linux)" <sunlightlinux@...il.com>
To: christian.loehle@....com
Cc: daniel.lezcano@...aro.org,
ionut_n2001@...oo.com,
linux-kernel@...r.kernel.org,
linux-pm@...r.kernel.org,
rafael@...nel.org,
sunlightlinux@...il.com,
yumpusamongus@...il.com
Subject: Re: [PATCH v2 0/1] cpuidle: menu: Fix high wakeup latency on modern platforms
From: Ionut Nechita <sunlightlinux@...il.com>
On Thu, Jan 22 2026 at 08:49, Christian Loehle wrote:
> It was more of a question than a suggestion outright... And I still have
> more of them, quoting v1:
Thank you for the detailed feedback. Let me provide more context about
the workload and the platforms where I observed this issue.
> You also measured 150us wakeup latency, does this match the reported exit
> latency for your platform (roughly)?
> What do the platform states look like for you?
Yes, the measured latency matches the reported exit latencies. Here are
the platforms I've tested:
1. Intel Xeon Gold 6443N (Sapphire Rapids):
- C6 state: 190us latency, 600us residency target
- C1E state: 2us latency, 4us residency target
- Driver: intel_idle
2. AMD Ryzen 9 5900HS (laptop):
- C3 state: 350us latency, 700us residency target
- C2 state: 18us latency, 36us residency target
- Driver: acpi_idle
The problem manifests primarily on the Sapphire Rapids platform where
C6 has 190us exit latency.
> Also regarding NOHZ_FULL, does that make a difference for your workload?
Yes, absolutely. The workload context is:
- PREEMPT_RT kernel (realtime)
- Isolated cores (isolcpus=)
- NOHZ_FULL enabled on isolated cores
- Inter-core communication latency testing with qperf
- kthreads and IRQ affinity set to non-isolated cores
The scenario: Core A (isolated, NOHZ_FULL) sends a message to Core B
(also isolated, NOHZ_FULL, currently idle). Core B enters C6 during
idle, then when the message arrives, the 190us exit latency dominates
the response time. This is unacceptable for realtime workloads.
> Frankly, if there's relatively strict latency requirements on the system
> you need to let cpuidle know via pm qos or dma_latency....
I considered PM QoS and /dev/cpu_dma_latency, but they have limitations
for this use case:
1. Global PM QoS affects all cores, not just the isolated ones
2. Per-task PM QoS requires application modifications
3. /dev/cpu_dma_latency is system-wide, not per-core
For isolated cores with NOHZ_FULL in a realtime environment, we want
the governor to make smarter decisions based on actual predicted idle
time rather than relying on next_timer_ns which can be arbitrarily large
on tickless cores.
> A trace or cpuidle sysfs dump pre and post workload would really help to
> understand the situation.
I will collect and provide:
- ftrace cpuidle event traces
- Complete sysfs cpuidle dumps pre/post workload
- C-state residency and usage statistics
- Detailed qperf latency measurements
Regarding the safety margin question from v1: you're right that I need
to clarify the logic. The goal is to clamp the upper bound to avoid
unnecessarily deep states when prediction suggests short idle, while
still respecting the prediction for target residency selection.
I'll send a follow-up with the detailed trace data and measurements.
Thanks for your patience and valuable feedback,
Ionut
Powered by blists - more mailing lists