lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <000001d54892$a25b86b0$e7129410$@net>
Date:   Thu, 1 Aug 2019 10:57:46 -0700
From:   "Doug Smythies" <dsmythies@...us.net>
To:     "'Viresh Kumar'" <viresh.kumar@...aro.org>
Cc:     "'Rafael J. Wysocki'" <rafael@...nel.org>,
        "'Rafael Wysocki'" <rjw@...ysocki.net>,
        "'Ingo Molnar'" <mingo@...hat.com>,
        "'Peter Zijlstra'" <peterz@...radead.org>,
        "'Linux PM'" <linux-pm@...r.kernel.org>,
        "'Vincent Guittot'" <vincent.guittot@...aro.org>,
        "'Joel Fernandes'" <joel@...lfernandes.org>,
        "'v4 . 18+'" <stable@...r.kernel.org>,
        "'Linux Kernel Mailing List'" <linux-kernel@...r.kernel.org>
Subject: RE: [PATCH] cpufreq: schedutil: Don't skip freq update when limits change

On 2019.07.31 23:17 Viresh Kumar wrote:
> On 31-07-19, 17:20, Doug Smythies wrote:
>> Summary:
>> 
>> The old way, using UINT_MAX had two purposes: first,
>> as a "need to do a frequency update" flag; but also second, to
>> force any subsequent old/new frequency comparison to NOT be "the same,
>> so why bother actually updating" (see: sugov_update_next_freq). All
>> patches so far have been dealing with the flag, but only partially
>> the comparisons. In a busy system, and when schedutil.c doesn't actually
>> know the currently set system limits, the new frequency is dominated by
>> values the same as the old frequency. So, when sugov_fast_switch calls 
>> sugov_update_next_freq, false is usually returned.
>
> And finally we know "Why" :)
>
> Good work Doug. Thanks for taking it to the end.
>
>> However, if we move the resetting of the flag and add another condition
>> to the "no need to actually update" decision, then perhaps this patch
>> version 1 will be O.K. It seems to be. (see way later in this e-mail).
>
>> With all this new knowledge, how about going back to
>> version 1 of this patch, and then adding this:
>> 
>> diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c
>> index 808d32b..f9156db 100644
>> --- a/kernel/sched/cpufreq_schedutil.c
>> +++ b/kernel/sched/cpufreq_schedutil.c
>> @@ -100,7 +100,12 @@ static bool sugov_should_update_freq(struct sugov_policy *sg_policy, u64 time)
>>  static bool sugov_update_next_freq(struct sugov_policy *sg_policy, u64 time,
>>                                    unsigned int next_freq)
>>  {
>> -       if (sg_policy->next_freq == next_freq)
>> +       /*
>> +        * Always force an update if the flag is set, regardless.
>> +        * In some implementations (intel_cpufreq) the frequency is clamped
>> +        * further downstream, and might not actually be different here.
>> +        */
>> +       if (sg_policy->next_freq == next_freq && !sg_policy->need_freq_update)
>>                 return false;
>
> This is not correct because this is an optimization we have in place
> to make things more efficient. And it was working by luck earlier and
> my patch broke it for good :)

Disagree.
All I did was use a flag where it used to be set to UNIT_MAX, to basically
implement the same thing.

> Things need to get a bit more synchronized and something like this may
> help (completely untested):
>
> diff --git a/drivers/cpufreq/intel_pstate.c b/drivers/cpufreq/intel_pstate.c
> index cc27d4c59dca..2d84361fbebc 100644
> --- a/drivers/cpufreq/intel_pstate.c
> +++ b/drivers/cpufreq/intel_pstate.c
> @@ -2314,6 +2314,18 @@ static int intel_cpufreq_target(struct cpufreq_policy *policy,
>        return 0;
> }
> 
> +static unsigned int intel_cpufreq_resolve_freq(struct cpufreq_policy *policy,
> +                                              unsigned int target_freq)
> +{
> +       struct cpudata *cpu = all_cpu_data[policy->cpu];
> +       int target_pstate;
> +
> +       target_pstate = DIV_ROUND_UP(target_freq, cpu->pstate.scaling);
> +       target_pstate = intel_pstate_prepare_request(cpu, target_pstate);
> +
> +       return target_pstate * cpu->pstate.scaling;
> +}
> +
>  static unsigned int intel_cpufreq_fast_switch(struct cpufreq_policy *policy,
>                                               unsigned int target_freq)
>  {
> @@ -2350,6 +2362,7 @@ static struct cpufreq_driver intel_cpufreq = {
>         .verify         = intel_cpufreq_verify_policy,
>         .target         = intel_cpufreq_target,
>         .fast_switch    = intel_cpufreq_fast_switch,
> +       .resolve_freq   = intel_cpufreq_resolve_freq,
>         .init           = intel_cpufreq_cpu_init,
>         .exit           = intel_pstate_cpu_exit,
>         .stop_cpu       = intel_cpufreq_stop_cpu,
> 
> -------------------------8<-------------------------
>
> Please try this with my patch 2.

O.K.

> We need patch 2 instead of 1 because
> of another race condition Rafael noticed.

Disagree.
Notice that my modifications to your patch1 addresses
that condition by moving the clearing of "need_freq_update"
to sometime later.

> 
> cpufreq_schedutil calls driver specific resolve_freq() to find the new
> target frequency and this is where the limits should get applied IMO.

Oh! I didn't know. But yes, that makes sense.

>
> Rafael can help with reviewing this diff but it would be great if you
> can give this a try Doug.

Anyway, I added the above code (I am calling it patch3) to patch2, as
you asked, and it does work. I also added it to my modified patch1,
additionally removing the extra condition check that I added
(i.e. all that remains of my patch1 modifications is the moved
clearing of "need_freq_update") That kernel also worked for both
intel_cpufreq/schedutil and acpi-cpufreq/schedutil.

Again, I do not know how to test the original issue that led
to the change away from UINT_MAX in the first place,
ecd2884291261e3fddbc7651ee11a20d596bb514, which should be
tested in case of some introduced regression.

... Doug


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ