lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAJZ5v0iu_hMud6FRg6FiUVQ1cY6oYqrEmwpwPTeYJh5Yzh5Q8A@mail.gmail.com>
Date: Thu, 10 Apr 2025 20:47:38 +0200
From: "Rafael J. Wysocki" <rafael@...nel.org>
To: Sultan Alsawaf <sultan@...neltoast.com>
Cc: "Rafael J. Wysocki" <rafael@...nel.org>, Stephan Gerhold <stephan.gerhold@...aro.org>, 
	Viresh Kumar <viresh.kumar@...aro.org>, Ingo Molnar <mingo@...hat.com>, 
	Peter Zijlstra <peterz@...radead.org>, Juri Lelli <juri.lelli@...hat.com>, 
	Vincent Guittot <vincent.guittot@...aro.org>, Dietmar Eggemann <dietmar.eggemann@....com>, 
	Steven Rostedt <rostedt@...dmis.org>, Ben Segall <bsegall@...gle.com>, Mel Gorman <mgorman@...e.de>, 
	Valentin Schneider <vschneid@...hat.com>, Christian Loehle <christian.loehle@....com>, linux-pm@...r.kernel.org, 
	linux-kernel@...r.kernel.org
Subject: Re: [PATCH] cpufreq: schedutil: Don't ignore limit changes when util
 is unchanged

On Thu, Apr 10, 2025 at 6:03 PM Sultan Alsawaf <sultan@...neltoast.com> wrote:
>
> On Thu, Apr 10, 2025 at 05:34:39PM +0200, Rafael J. Wysocki wrote:
> > On Thu, Apr 10, 2025 at 4:45 AM Sultan Alsawaf <sultan@...neltoast.com> wrote:
> > >
> > > From: Sultan Alsawaf <sultan@...neltoast.com>
> > >
> > > When utilization is unchanged, a policy limits update is ignored unless
> > > CPUFREQ_NEED_UPDATE_LIMITS is set. This occurs because limits_changed
> > > depends on the old broken behavior of need_freq_update to trigger a call
> > > into cpufreq_driver_resolve_freq() to evaluate the changed policy limits.
> > >
> > > After fixing need_freq_update, limit changes are ignored without
> > > CPUFREQ_NEED_UPDATE_LIMITS, at least until utilization changes enough to
> > > make map_util_freq() return something different.
> > >
> > > Fix the ignored limit changes by preserving the value of limits_changed
> > > until get_next_freq() is called, so limits_changed can trigger a call to
> > > cpufreq_driver_resolve_freq().
> > >
> > > Reported-and-tested-by: Stephan Gerhold <stephan.gerhold@...aro.org>
> > > Link: https://lore.kernel.org/lkml/Z_Tlc6Qs-tYpxWYb@linaro.org
> > > Fixes: 8e461a1cb43d6 ("cpufreq: schedutil: Fix superfluous updates caused by need_freq_update")
> > > Signed-off-by: Sultan Alsawaf <sultan@...neltoast.com>
> > > ---
> > >  kernel/sched/cpufreq_schedutil.c | 5 +++--
> > >  1 file changed, 3 insertions(+), 2 deletions(-)
> > >
> > > diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c
> > > index 1a19d69b91ed3..f37b999854d52 100644
> > > --- a/kernel/sched/cpufreq_schedutil.c
> > > +++ b/kernel/sched/cpufreq_schedutil.c
> > > @@ -82,7 +82,6 @@ static bool sugov_should_update_freq(struct sugov_policy *sg_policy, u64 time)
> > >                 return false;
> > >
> > >         if (unlikely(sg_policy->limits_changed)) {
> > > -               sg_policy->limits_changed = false;
> > >                 sg_policy->need_freq_update = cpufreq_driver_test_flags(CPUFREQ_NEED_UPDATE_LIMITS);
> > >                 return true;
> > >         }
> > > @@ -171,9 +170,11 @@ static unsigned int get_next_freq(struct sugov_policy *sg_policy,
> > >         freq = get_capacity_ref_freq(policy);
> > >         freq = map_util_freq(util, freq, max);
> > >
> > > -       if (freq == sg_policy->cached_raw_freq && !sg_policy->need_freq_update)
> > > +       if (freq == sg_policy->cached_raw_freq && !sg_policy->limits_changed &&
> > > +           !sg_policy->need_freq_update)
> > >                 return sg_policy->next_freq;
> > >
> > > +       sg_policy->limits_changed = false;
> >
> > AFAICS, after this code modification, a limit change may be missed due
> > to a possible race with sugov_limits() which cannot happen if
> > sg_policy->limits_changed is only cleared when it is set before
> > updating sg_policy->need_freq_update.
>
> I don't think that's the case because sg_policy->limits_changed is cleared
> before the new policy limits are evaluated in cpufreq_driver_resolve_freq().

sugov_limits() may be triggered by a scaling_max_freq update, for
instance, so it is asynchronous with respect to the usual governor
flow.  It updates sg_policy->limits_changed and assumes that next time
the governor runs, it will call into the driver, for example via
cpufreq_driver_fast_switch(), so the new limits take effect.  This is
not about cpufreq_driver_resolve_freq().

sugov_limits() runs after the driver's ->verify() callback has
returned and it is conditional on that callback's return value, so the
driver already knows the new limits when sugov_limits() runs, but it
may still need to tell the hardware what the new limits are and that's
why cpufreq_driver_fast_switch() may need to run.

Now, if sugov_should_update_freq() sees sg_policy->limits_changed set,
it will set sg_policy->need_freq_update which (for drivers with
CPUFREQ_NEED_UPDATE_LIMITS set) guarantees that the driver will be
invoked and so sg_policy->limits_changed can be cleared.

If a new instance of sugov_limits() runs at this point, there are two
possibilities.  Either it completes before the
sg_policy->limits_changed update in sugov_should_update_freq(), in
which case the driver already knows the new limits as per the above
and so the subsequent invocation of cpufreq_driver_fast_switch() will
pick them up, or it sets sg_policy->limits_changed again and the
governor will see it set next time it runs.  In both cases the new
limits will be picked up unless they are changed again in the
meantime.

After the above change, sg_policy->limits_changed may be cleared even
if it has not been set before and that's problematic.  Namely, say it
is unset when sugov_should_update_freq() runs, after being called by
sugov_update_single_freq() via sugov_update_single_common(), and
returns 'true' without setting sg_policy->need_freq_update.  Next,
sugov_update_single_common() returns 'true' and get_next_freq() is
called.  It sees that freq != sg_policy->cached_raw_freq, so it clears
sg_policy->limits_changed.  If sugov_limits() runs on a different CPU
between the check and the sg_policy->limits_changed update in
get_next_freq(), it may be missed and it is still not guaranteed that
cpufreq_driver_fast_switch() will run because
sg_policy->need_freq_update is unset and sugov_hold_freq() may return
'true'.

For this to work, sg_policy->limits_changed needs to be cleared only
when it is set and sg_policy->need_freq_update needs to be updated
when sg_policy->limits_changed is cleared.

It looks like you really want to set sg_policy->need_freq_update to
'true' in sugov_should_update_freq() when sg_policy->limits_changed is
set, but that would render CPUFREQ_NEED_UPDATE_LIMITS unnecessary.

> Granted, if we wanted to be really certain of this, we'd need release semantics.

I don't think so, but feel free to prove me wrong.

> Looking closer at cpufreq.c actually, isn't there already a race on the updated
> policy limits (policy->min and policy->max) since they can be updated again
> while schedutil reads them via cpufreq_driver_resolve_freq()?

That can happen, but it is a different thing.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ