[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20190801075554.ml6m7pqxsfdhiqom@vireshk-i7>
Date: Thu, 1 Aug 2019 13:25:54 +0530
From: Viresh Kumar <viresh.kumar@...aro.org>
To: "Rafael J. Wysocki" <rafael@...nel.org>
Cc: Doug Smythies <dsmythies@...us.net>,
Rafael Wysocki <rjw@...ysocki.net>,
Ingo Molnar <mingo@...hat.com>,
Peter Zijlstra <peterz@...radead.org>,
Linux PM <linux-pm@...r.kernel.org>,
Vincent Guittot <vincent.guittot@...aro.org>,
Joel Fernandes <joel@...lfernandes.org>,
"v4 . 18+" <stable@...r.kernel.org>,
Linux Kernel Mailing List <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH] cpufreq: schedutil: Don't skip freq update when limits
change
On 01-08-19, 09:47, Rafael J. Wysocki wrote:
> On Thu, Aug 1, 2019 at 8:17 AM Viresh Kumar <viresh.kumar@...aro.org> wrote:
> >
> > On 31-07-19, 17:20, Doug Smythies wrote:
> > > Hi Viresh,
> > >
> > > Summary:
> > >
> > > The old way, using UINT_MAX had two purposes: first,
> > > as a "need to do a frequency update" flag; but also second, to
> > > force any subsequent old/new frequency comparison to NOT be "the same,
> > > so why bother actually updating" (see: sugov_update_next_freq). All
> > > patches so far have been dealing with the flag, but only partially
> > > the comparisons. In a busy system, and when schedutil.c doesn't actually
> > > know the currently set system limits, the new frequency is dominated by
> > > values the same as the old frequency. So, when sugov_fast_switch calls
> > > sugov_update_next_freq, false is usually returned.
> >
> > And finally we know "Why" :)
> >
> > Good work Doug. Thanks for taking it to the end.
> >
> > > However, if we move the resetting of the flag and add another condition
> > > to the "no need to actually update" decision, then perhaps this patch
> > > version 1 will be O.K. It seems to be. (see way later in this e-mail).
> >
> > > With all this new knowledge, how about going back to
> > > version 1 of this patch, and then adding this:
> > >
> > > diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c
> > > index 808d32b..f9156db 100644
> > > --- a/kernel/sched/cpufreq_schedutil.c
> > > +++ b/kernel/sched/cpufreq_schedutil.c
> > > @@ -100,7 +100,12 @@ static bool sugov_should_update_freq(struct sugov_policy *sg_policy, u64 time)
> > > static bool sugov_update_next_freq(struct sugov_policy *sg_policy, u64 time,
> > > unsigned int next_freq)
> > > {
> > > - if (sg_policy->next_freq == next_freq)
> > > + /*
> > > + * Always force an update if the flag is set, regardless.
> > > + * In some implementations (intel_cpufreq) the frequency is clamped
> > > + * further downstream, and might not actually be different here.
> > > + */
> > > + if (sg_policy->next_freq == next_freq && !sg_policy->need_freq_update)
> > > return false;
> >
> > This is not correct because this is an optimization we have in place
> > to make things more efficient. And it was working by luck earlier and
> > my patch broke it for good :)
>
> OK, so since we know why it was wrong now, why don't we just revert
> it? Plus maybe add some comment explaining the rationale in there?
Because the patch [1] which caused these issues was almost correct,
just that it missed the busy accounting for single CPU case.
The main idea behind the original patch [1] was to avoid any
unwanted/hidden side-affects by overriding the value of next_freq.
What we see above is exactly the case for that. Because we override
the value of next_freq, we made intel-pstate work by chance,
unintentionally. Which is wrong. And who knows what other side affects
it had, we already found two (this one and the one fixed by [1]).
I would strongly suggest that we don't override the value of next_freq
with special meaning, as it is used at so many places we don't know
what it may result in.
--
viresh
[1] ecd288429126 cpufreq: schedutil: Don't set next_freq to UINT_MAX
Powered by blists - more mailing lists