[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <20260116063234.24084-1-kprateek.nayak@amd.com>
Date: Fri, 16 Jan 2026 06:32:32 +0000
From: K Prateek Nayak <kprateek.nayak@....com>
To: Huang Rui <ray.huang@....com>, "Gautham R. Shenoy"
<gautham.shenoy@....com>, Mario Limonciello <mario.limonciello@....com>,
"Rafael J. Wysocki" <rafael@...nel.org>, Viresh Kumar
<viresh.kumar@...aro.org>, Srinivas Pandruvada
<srinivas.pandruvada@...ux.intel.com>, Len Brown <lenb@...nel.org>,
"Sebastian Andrzej Siewior" <bigeasy@...utronix.de>, Clark Williams
<clrkwllms@...nel.org>, Bert Karwatzki <spasswolf@....de>,
<linux-pm@...r.kernel.org>, <linux-kernel@...r.kernel.org>,
<linux-rt-devel@...ts.linux.dev>, <rust-for-linux@...r.kernel.org>, "Ingo
Molnar" <mingo@...hat.com>, Peter Zijlstra <peterz@...radead.org>, Juri Lelli
<juri.lelli@...hat.com>, Vincent Guittot <vincent.guittot@...aro.org>,
"Miguel Ojeda" <ojeda@...nel.org>
CC: Perry Yuan <perry.yuan@....com>, K Prateek Nayak <kprateek.nayak@....com>,
Dietmar Eggemann <dietmar.eggemann@....com>, Steven Rostedt
<rostedt@...dmis.org>, Ben Segall <bsegall@...gle.com>, Mel Gorman
<mgorman@...e.de>, Valentin Schneider <vschneid@...hat.com>, Boqun Feng
<boqun.feng@...il.com>, Gary Guo <gary@...yguo.net>,
Björn Roy Baron <bjorn3_gh@...tonmail.com>, Benno Lossin
<lossin@...nel.org>, Andreas Hindborg <a.hindborg@...nel.org>, Alice Ryhl
<aliceryhl@...gle.com>, Trevor Gross <tmgross@...ch.edu>, Danilo Krummrich
<dakr@...nel.org>
Subject: [PATCH v3 0/2] cpufreq/amd-pstate: Prevent scheduling when atomic on PREEMPT_RT
Bert reported hitting "BUG: scheduling while atomic" when running
amd-pstate-ut on a PREEMPT_RT kernel [1].
Since reader-writer locks turn sleepable on PREEMPT_RT, they are not
suitable to be used in the scheduler hot-path under rq_lock to grab the
cpufreq policy object.
Unfortunately, the amd-pstate driver has a tight coupling between the
cpufreq_policy object and the cpudata stored in it as the driver_data.
Trying to grab a read reference on PREEMPT_RT can cause "scheduling
while atomic" if a concurrent writer is active, and trying to grab a
nested reference in presence of a writer can cause a deadlock (manifests
as lockup) since the reader fast-path is disabled on PREEMPT_RT to
prevent write-side starvation.
The two patches included removes cases of grabbing a nested read
reference to the cpufreq policy in amd-pstate, and modifies the
cpufreq_driver->adjust_perf() callback to take the raw policy reference
cached by the schedutil governor respectively.
The policy object outlives the governor and the driver making it safe to
use this cached reference from the sugov data. Any changes to the policy
will end up calling cpufreq_driver->set_policy() or
governor->set_limits() once the policy is modified which should ensure
eventual consistency despite not holding the read-side.
Series has been tested with amd-pstate-ut on PREEMPT_RT kernel which
successfully passes without any splats on LOCKDEP + DEBUG_ATOMIC_SLEEP
config. Additionally, the driver switch test from Gautham [2] was run
for 10min on the same config without observing any splats.
[1] https://lore.kernel.org/all/20250731092316.3191-1-spasswolf@web.de/
[2] https://lore.kernel.org/all/aJRN2wMLAnhDFykv@BLRRASHENOY1.amd.com/
Patches are based on:
git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm.git bleeding-edge
at commit d22b6c061c99 ("Merge branch 'pm-runtime-cleanup' into
bleeding-edge") (2026-01-16).
---
Changelog v2..v3:
o Fixed the rust bindings for adjust_perf_callaback in Patch 2 (kernel
test robot).
o Tested PREEMPT_RT + CONFIG_RUST builds using amd-pstate-ut.
o Viresh's ack on Patch 2 was retained since the incremental diff for
Rust bindings was discussed on v2. (Viresh let me know in case you
have additional comments and if I should drop the tag.)
o Adding the Rust and sched folks to Cc for awareness on rust bindings
and the schedutil bits respectively.
v2: https://lore.kernel.org/lkml/20260114085113.21378-1-kprateek.nayak@amd.com/
Changelog rfc v1..v2:
o Updated the kdoc comment above cpufreq_driver_adjust_perf() in Patch 2
to reflect that cpufreq_policy is passed as an argument now instead of
the target CPU. (kernel test robot)
o Added "Reported-by:" and "Closes:" tags to Patch 2. (Mario)
o Collected tags from v1. (Thank you Mario, Viresh)
o Dropped the RFC tag.
v1: https://lore.kernel.org/lkml/20260106073608.278644-1-kprateek.nayak@amd.com/
---
K Prateek Nayak (2):
cpufreq/amd-pstate: Pass the policy to amd_pstate_update()
cpufreq: Pass the policy to cpufreq_driver->adjust_perf()
drivers/cpufreq/amd-pstate.c | 14 +++++---------
drivers/cpufreq/cpufreq.c | 6 +++---
drivers/cpufreq/intel_pstate.c | 4 ++--
include/linux/cpufreq.h | 4 ++--
kernel/sched/cpufreq_schedutil.c | 5 +++--
rust/kernel/cpufreq.rs | 13 ++++++-------
6 files changed, 21 insertions(+), 25 deletions(-)
base-commit: d22b6c061c9911a8d0f76c6e902c951455c8c4ba
--
2.34.1
Powered by blists - more mailing lists