[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <aALQhEi609NQAV7S@sultan-box.localdomain>
Date: Sat, 19 Apr 2025 08:21:56 +1000
From: Sultan Alsawaf <sultan@...neltoast.com>
To: "Rafael J. Wysocki" <rafael@...nel.org>
Cc: "Rafael J. Wysocki" <rjw@...ysocki.net>,
Linux PM <linux-pm@...r.kernel.org>,
LKML <linux-kernel@...r.kernel.org>,
Viresh Kumar <viresh.kumar@...aro.org>,
Srinivas Pandruvada <srinivas.pandruvada@...ux.intel.com>,
Mario Limonciello <mario.limonciello@....com>,
Vincent Guittot <vincent.guittot@...aro.org>,
Christian Loehle <christian.loehle@....com>,
Peter Zijlstra <peterz@...radead.org>,
Valentin Schneider <vschneid@...hat.com>,
Ingo Molnar <mingo@...hat.com>
Subject: Re: [PATCH v2 5/6] cpufreq: Avoid using inconsistent policy->min and
policy->max
On Fri, Apr 18, 2025 at 09:42:15PM +0200, Rafael J. Wysocki wrote:
> On Fri, Apr 18, 2025 at 12:18 PM Sultan Alsawaf <sultan@...neltoast.com> wrote:
> >
> > On Tue, Apr 15, 2025 at 12:04:21PM +0200, Rafael J. Wysocki wrote:
> > > From: Rafael J. Wysocki <rafael.j.wysocki@...el.com>
> > >
> > > Since cpufreq_driver_resolve_freq() can run in parallel with
> > > cpufreq_set_policy() and there is no synchronization between them,
> > > the former may access policy->min and policy->max while the latter
> > > is updating them and it may see intermediate values of them due
> > > to the way the update is carried out. Also the compiler is free
> > > to apply any optimizations it wants both to the stores in
> > > cpufreq_set_policy() and to the loads in cpufreq_driver_resolve_freq()
> > > which may result in additional inconsistencies.
> > >
> > > To address this, use WRITE_ONCE() when updating policy->min and
> > > policy->max in cpufreq_set_policy() and use READ_ONCE() for reading
> > > them in cpufreq_driver_resolve_freq(). Moreover, rearrange the update
> > > in cpufreq_set_policy() to avoid storing intermediate values in
> > > policy->min and policy->max with the help of the observation that
> > > their new values are expected to be properly ordered upfront.
> > >
> > > Also modify cpufreq_driver_resolve_freq() to take the possible reverse
> > > ordering of policy->min and policy->max, which may happen depending on
> > > the ordering of operations when this function and cpufreq_set_policy()
> > > run concurrently, into account by always honoring the max when it
> > > turns out to be less than the min (in case it comes from thermal
> > > throttling or similar).
> > >
> > > Fixes: 151717690694 ("cpufreq: Make policy min/max hard requirements")
> > > Cc: 5.16+ <stable@...r.kernel.org> # 5.16+
> > > Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@...el.com>
> > > ---
> > >
> > > v1 -> v2: Minor edit in the subject
> > >
> > > ---
> > > drivers/cpufreq/cpufreq.c | 46 ++++++++++++++++++++++++++++++++++++----------
> > > 1 file changed, 36 insertions(+), 10 deletions(-)
> > >
> > > --- a/drivers/cpufreq/cpufreq.c
> > > +++ b/drivers/cpufreq/cpufreq.c
> > > @@ -490,14 +490,12 @@
> > > }
> > > EXPORT_SYMBOL_GPL(cpufreq_disable_fast_switch);
> > >
> > > -static unsigned int clamp_and_resolve_freq(struct cpufreq_policy *policy,
> > > - unsigned int target_freq,
> > > - unsigned int relation)
> > > +static unsigned int __resolve_freq(struct cpufreq_policy *policy,
> > > + unsigned int target_freq,
> > > + unsigned int relation)
> > > {
> > > unsigned int idx;
> > >
> > > - target_freq = clamp_val(target_freq, policy->min, policy->max);
> > > -
> > > if (!policy->freq_table)
> > > return target_freq;
> > >
> > > @@ -507,6 +505,15 @@
> > > return policy->freq_table[idx].frequency;
> > > }
> > >
> > > +static unsigned int clamp_and_resolve_freq(struct cpufreq_policy *policy,
> > > + unsigned int target_freq,
> > > + unsigned int relation)
> > > +{
> > > + target_freq = clamp_val(target_freq, policy->min, policy->max);
> > > +
> > > + return __resolve_freq(policy, target_freq, relation);
> > > +}
> > > +
> > > /**
> > > * cpufreq_driver_resolve_freq - Map a target frequency to a driver-supported
> > > * one.
> > > @@ -521,7 +528,22 @@
> > > unsigned int cpufreq_driver_resolve_freq(struct cpufreq_policy *policy,
> > > unsigned int target_freq)
> > > {
> > > - return clamp_and_resolve_freq(policy, target_freq, CPUFREQ_RELATION_LE);
> > > + unsigned int min = READ_ONCE(policy->min);
> > > + unsigned int max = READ_ONCE(policy->max);
> > > +
> > > + /*
> > > + * If this function runs in parallel with cpufreq_set_policy(), it may
> > > + * read policy->min before the update and policy->max after the update
> > > + * or the other way around, so there is no ordering guarantee.
> > > + *
> > > + * Resolve this by always honoring the max (in case it comes from
> > > + * thermal throttling or similar).
> > > + */
> > > + if (unlikely(min > max))
> > > + min = max;
> > > +
> > > + return __resolve_freq(policy, clamp_val(target_freq, min, max),
> > > + CPUFREQ_RELATION_LE);
> > > }
> > > EXPORT_SYMBOL_GPL(cpufreq_driver_resolve_freq);
> > >
> > > @@ -2632,11 +2654,15 @@
> > > * Resolve policy min/max to available frequencies. It ensures
> > > * no frequency resolution will neither overshoot the requested maximum
> > > * nor undershoot the requested minimum.
> > > + *
> > > + * Avoid storing intermediate values in policy->max or policy->min and
> > > + * compiler optimizations around them because them may be accessed
> > > + * concurrently by cpufreq_driver_resolve_freq() during the update.
> > > */
> > > - policy->min = new_data.min;
> > > - policy->max = new_data.max;
> > > - policy->min = clamp_and_resolve_freq(policy, policy->min, CPUFREQ_RELATION_L);
> > > - policy->max = clamp_and_resolve_freq(policy, policy->max, CPUFREQ_RELATION_H);
> > > + WRITE_ONCE(policy->max, __resolve_freq(policy, new_data.max, CPUFREQ_RELATION_H));
> > > + new_data.min = __resolve_freq(policy, new_data.min, CPUFREQ_RELATION_L);
> > > + WRITE_ONCE(policy->min, new_data.min > policy->max ? policy->max : new_data.min);
> >
> > I don't think this is sufficient, because this still permits an incoherent
> > policy->min and policy->max combination, which makes it possible for schedutil
> > to honor the incoherent limits; i.e., schedutil may observe old policy->min and
> > new policy->max or vice-versa.
>
> Yes, it may, as stated in the new comment in cpufreq_driver_resolve_freq().
Thanks for pointing that out; I had ignored that hunk while reviewing.
But I ignored it because schedutil still accesses policy->min/max unprotected
via cpufreq_policy_apply_limits() and __cpufreq_driver_target(). The race still
affects those calls.
> > We also can't permit a wrong freq to be propagated to the driver and then send
> > the _right_ freq afterwards; IOW, we can't let a bogus freq slip through and
> > just correct it later.
>
> The frequency is neither wrong nor bogus, it is only affected by one
> of the limits that were in effect previously or will be in effect
> going forward. They are valid limits in either case.
I would argue that limits only make sense as a pair, not on their own. Checking
for min > max only covers the case where the new min exceeds the old max; this
means that, when min is raised without exceeding the old max, a thermal throttle
attempt could instead result in a raised frequency floor:
1. policy->min == 100000, policy->max == 2500000
2. Policy limit update request: new min of 400000, new max of 500000
3. schedutil observes policy->min == 400000, policy->max == 2500000
Raising the min freq while lowering the max freq can be a valid thermal throttle
scheme. But it only makes sense if both limits are applied simultaneously.
> > How about using a seqlock?
>
> This would mean extra overhead in the scheduler path pretty much for no gain.
Or there's the slightly cursed approach of using a union to facilitate an atomic
64-bit store of policy->min and max at the same time, since min/max are 32 bits.
Sultan
Powered by blists - more mailing lists