[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <aAplED3IA_J0eZN0@linaro.org>
Date: Thu, 24 Apr 2025 18:21:36 +0200
From: Stephan Gerhold <stephan.gerhold@...aro.org>
To: "Rafael J. Wysocki" <rjw@...ysocki.net>
Cc: Linux PM <linux-pm@...r.kernel.org>,
Christian Loehle <christian.loehle@....com>,
LKML <linux-kernel@...r.kernel.org>,
Viresh Kumar <viresh.kumar@...aro.org>,
Srinivas Pandruvada <srinivas.pandruvada@...ux.intel.com>,
Mario Limonciello <mario.limonciello@....com>,
Vincent Guittot <vincent.guittot@...aro.org>,
Sultan Alsawaf <sultan@...neltoast.com>,
Peter Zijlstra <peterz@...radead.org>,
Valentin Schneider <vschneid@...hat.com>,
Ingo Molnar <mingo@...hat.com>, regressions@...ts.linux.dev,
Johan Hovold <johan@...nel.org>
Subject: Re: [PATCH v3] cpufreq: Avoid using inconsistent policy->min and
policy->max
Hi Rafael,
On Wed, Apr 16, 2025 at 04:12:37PM +0200, Rafael J. Wysocki wrote:
> From: Rafael J. Wysocki <rafael.j.wysocki@...el.com>
>
> Since cpufreq_driver_resolve_freq() can run in parallel with
> cpufreq_set_policy() and there is no synchronization between them,
> the former may access policy->min and policy->max while the latter
> is updating them and it may see intermediate values of them due
> to the way the update is carried out. Also the compiler is free
> to apply any optimizations it wants both to the stores in
> cpufreq_set_policy() and to the loads in cpufreq_driver_resolve_freq()
> which may result in additional inconsistencies.
>
> To address this, use WRITE_ONCE() when updating policy->min and
> policy->max in cpufreq_set_policy() and use READ_ONCE() for reading
> them in cpufreq_driver_resolve_freq(). Moreover, rearrange the update
> in cpufreq_set_policy() to avoid storing intermediate values in
> policy->min and policy->max with the help of the observation that
> their new values are expected to be properly ordered upfront.
>
> Also modify cpufreq_driver_resolve_freq() to take the possible reverse
> ordering of policy->min and policy->max, which may happen depending on
> the ordering of operations when this function and cpufreq_set_policy()
> run concurrently, into account by always honoring the max when it
> turns out to be less than the min (in case it comes from thermal
> throttling or similar).
>
> Fixes: 151717690694 ("cpufreq: Make policy min/max hard requirements")
> Cc: 5.16+ <stable@...r.kernel.org> # 5.16+
> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@...el.com>
> ---
>
> This replaces the last 3 patches in
>
> https://lore.kernel.org/linux-pm/6171293.lOV4Wx5bFT@rjwysocki.net/
>
> v2 -> v3:
> * Fold 3 patches into one.
> * Drop an unrelated white space fixup change.
> * Fix a typo in a comment (Christian).
>
> v1 -> v2: Cosmetic changes
>
> ---
> drivers/cpufreq/cpufreq.c | 32 +++++++++++++++++++++++++-------
> 1 file changed, 25 insertions(+), 7 deletions(-)
>
> --- a/drivers/cpufreq/cpufreq.c
> +++ b/drivers/cpufreq/cpufreq.c
> @@ -495,8 +495,6 @@
> {
> unsigned int idx;
>
> - target_freq = clamp_val(target_freq, policy->min, policy->max);
> -
> if (!policy->freq_table)
> return target_freq;
>
> @@ -520,7 +518,22 @@
> unsigned int cpufreq_driver_resolve_freq(struct cpufreq_policy *policy,
> unsigned int target_freq)
> {
> - return __resolve_freq(policy, target_freq, CPUFREQ_RELATION_LE);
> + unsigned int min = READ_ONCE(policy->min);
> + unsigned int max = READ_ONCE(policy->max);
> +
> + /*
> + * If this function runs in parallel with cpufreq_set_policy(), it may
> + * read policy->min before the update and policy->max after the update
> + * or the other way around, so there is no ordering guarantee.
> + *
> + * Resolve this by always honoring the max (in case it comes from
> + * thermal throttling or similar).
> + */
> + if (unlikely(min > max))
> + min = max;
> +
> + return __resolve_freq(policy, clamp_val(target_freq, min, max),
> + CPUFREQ_RELATION_LE);
> }
> EXPORT_SYMBOL_GPL(cpufreq_driver_resolve_freq);
>
> @@ -2338,6 +2351,7 @@
> if (cpufreq_disabled())
> return -ENODEV;
>
> + target_freq = clamp_val(target_freq, policy->min, policy->max);
> target_freq = __resolve_freq(policy, target_freq, relation);
>
> pr_debug("target for CPU %u: %u kHz, relation %u, requested %u kHz\n",
> @@ -2631,11 +2645,15 @@
> * Resolve policy min/max to available frequencies. It ensures
> * no frequency resolution will neither overshoot the requested maximum
> * nor undershoot the requested minimum.
> + *
> + * Avoid storing intermediate values in policy->max or policy->min and
> + * compiler optimizations around them because they may be accessed
> + * concurrently by cpufreq_driver_resolve_freq() during the update.
> */
> - policy->min = new_data.min;
> - policy->max = new_data.max;
> - policy->min = __resolve_freq(policy, policy->min, CPUFREQ_RELATION_L);
> - policy->max = __resolve_freq(policy, policy->max, CPUFREQ_RELATION_H);
> + WRITE_ONCE(policy->max, __resolve_freq(policy, new_data.max, CPUFREQ_RELATION_H));
> + new_data.min = __resolve_freq(policy, new_data.min, CPUFREQ_RELATION_L);
> + WRITE_ONCE(policy->min, new_data.min > policy->max ? policy->max : new_data.min);
I've tested the cpufreq throttling again in 6.15-rc3 to check your fix
for the schedutil CPUFREQ_NEED_UPDATE_LIMITS regression I reported [1].
The CPU frequency is now being throttled correctly when reaching high
temperatures. Thanks for fixing this!
Unfortunately, the opposite case has now regressed with this patch:
After the CPU frequency has been throttled due to high temperature and
the device cools down again, the CPU frequency is stuck at minimum until
you reboot. policy->max will never restore to the maximum frequency.
I've confirmed that this causes unexpected slowness after temperature
throttling on a Qualcomm X1E laptop, and Johan has confirmed that e.g.
the ThinkPad X13s is also affected. I would expect that most devices
using cpufreq cooling in the kernel are affected.
Looking at the code, I think the problem is that __resolve_freq() ->
cpufreq_frequency_table_target() -> cpufreq_table_find_index*() and
cpufreq_is_in_limits() are still using the old policy->min/max value.
In this patch, you have only moved the clamp_val() usage directly in
__resolve_freq().
You can see this in the following debug log. I started a stress test
that increases the device temperature until the CPU frequency was
throttled to minimum. Then I stopped the stress test, the device cooled
down, cpufreq_set_policy() was called with the new max frequency, but
__resolve_freq() still returned the frequency clamped to the old maximum.
[ 149.959693] cpufreq: handle_update for cpu 0 called
[ 149.964782] cpufreq: updating policy for CPU 0
[ 149.969411] cpufreq: setting new policy for CPU 0: 0 - 3206400 kHz
[ 149.975842] cpufreq: new min and max freqs are 710400 - 3206400 kHz
[ 149.982347] cpufreq: governor limits update
[ 149.986715] cpufreq: cpufreq_governor_limits: for CPU 0
[...]
[ 161.219209] cpufreq: handle_update for cpu 0 called
[ 161.224291] cpufreq: updating policy for CPU 0
[ 161.228927] cpufreq: setting new policy for CPU 0: 0 - 710400 kHz
[ 161.235238] cpufreq: new min and max freqs are 710400 - 710400 kHz
[ 161.241635] cpufreq: governor limits update
[ 161.245989] cpufreq: cpufreq_governor_limits: for CPU 0
[ 221.253253] cpufreq: handle_update for cpu 0 called
[ 221.258322] cpufreq: updating policy for CPU 0
[ 221.262946] cpufreq: setting new policy for CPU 0: 0 - 3417600 kHz
[ 221.269418] cpufreq: new min and max freqs are 710400 - 710400 kHz
^ here the new maximum is not being applied
[ 221.275839] cpufreq: governor limits update
[ 221.280195] cpufreq: cpufreq_governor_limits: for CPU 0
Any thoughts how to fix this properly?
Please Cc me if you send a patch with the fix. The Reported-by: tag from
me on the fix for the CPUFREQ_NEED_UPDATE_LIMITS problem didn't Cc me
when sending for some reason, so I only learned about your fix when Greg
sent out the stable backports yesterday. :-)
Thanks,
Stephan
[1]: https://lore.kernel.org/lkml/Z_Tlc6Qs-tYpxWYb@linaro.org/
#regzbot introduced: 7491cdf46b5cbdf123fc84fbe0a07e9e3d7b7620
Powered by blists - more mailing lists