linux-kernel - Re: [PATCH] cpufreq: schedutil: set sg_policy->next

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <004001d6ade5$0f8dde00$2ea99a00$@gmail.com>
Date:   Thu, 29 Oct 2020 19:17:11 +0800
From:   "zhuguangqing83" <zhuguangqing83@...il.com>
To:     "'Viresh Kumar'" <viresh.kumar@...aro.org>
Cc:     <rjw@...ysocki.net>, <mingo@...hat.com>, <peterz@...radead.org>,
        <juri.lelli@...hat.com>, <vincent.guittot@...aro.org>,
        <dietmar.eggemann@....com>, <rostedt@...dmis.org>,
        <bsegall@...gle.com>, <mgorman@...e.de>, <bristot@...hat.com>,
        <linux-pm@...r.kernel.org>, <linux-kernel@...r.kernel.org>,
        "'zhuguangqing'" <zhuguangqing@...omi.com>
Subject: Re: [PATCH] cpufreq: schedutil: set sg_policy->next_freq to the final cpufreq

On 10/29/2020 15:19，Viresh Kumar<viresh.kumar@...aro.org> wrote：
> Your mail client is screwing the "In-reply-to" field of the message
> and that prevents it to appear properly in the thread in mailboxes of
> other people, please fix that.
> 

I will try to fix that.

> On 29-10-20, 09:43, zhuguangqing83 wrote:
> > > diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c
> > > index 0c5c61a095f6..bf7800e853d3 100644
> > > --- a/kernel/sched/cpufreq_schedutil.c
> > > +++ b/kernel/sched/cpufreq_schedutil.c
> > > @@ -105,7 +105,6 @@ static bool sugov_update_next_freq(struct sugov_policy *sg_policy, u64 time,
> > >         if (sg_policy->next_freq == next_freq)
> > >                 return false;
> > >
> > > -       sg_policy->next_freq = next_freq;
> > >         sg_policy->last_freq_update_time = time;
> > >
> > >         return true;
> >
> > It's a little strange that sg_policy->next_freq and
> > sg_policy->last_freq_update_time are not updated at the same time.
> >
> > > @@ -115,7 +114,7 @@ static void sugov_fast_switch(struct sugov_policy *sg_policy, u64 time,
> > >                               unsigned int next_freq)
> > >  {
> > >         if (sugov_update_next_freq(sg_policy, time, next_freq))
> > > -               cpufreq_driver_fast_switch(sg_policy->policy, next_freq);
> > > +               sg_policy->next_freq = cpufreq_driver_fast_switch(sg_policy->policy, next_freq);
> > >  }
> > >
> >
> > Great, it also takes into account the issue that 0 is returned by the
> > driver's ->fast_switch() callback to indicate an error condition.
> 
> Yes but even my change wasn't good enough, more on it later.
> 
> > For policy->min/max may be not the real CPU frequency in OPPs, so
> 
> No, that can't happen. If userspace tries to set a value too large or
> too small, we clamp that too to policy->max/min and so the below
> problem shall never occur.
> 
> > next_freq got from get_next_freq() which is after clamping between
> > policy->min and policy->max may be not the real CPU frequency in OPPs.
> > In that case, if we use a real CPU frequency in OPPs returned from
> > cpufreq_driver_fast_switch() to compare with next_freq,
> > "if (sg_policy->next_freq == next_freq)" will never be satisfied, so we
> > change the CPU frequency every time schedutil callback gets called from
> > the scheduler. I see the current code in get_next_freq() as following,
> > the issue mentioned above should not happen. Maybe it's just one of my
> > unnecessary worries.
> 
> Coming back to my patch (and yours too), it only fixes the fast-switch
> case and not the slow path which can also end up clamping the
> frequency. And to be honest, even the drivers can have their own
> clamping code in place, no one is stopping them too.
> 
> And we also need to do something about the cached_raw_freq as well, as
> it will not be in sync with next_freq anymore.
> 
> Here is another attempt from me tackling this problem. The idea is to
> check if the previous freq update went as expected or not. And if not,
> we can't rely on next_freq or cached_raw_freq anymore. For this to
> work properly, we need to make sure policy->cur isn't getting updated
> at the same time when get_next_freq() is running. For that I have
> given a different meaning to work_in_progress flag, which now creates
> a lockless (kind of) critical section where we won't play with
> next_freq while the cpufreq core is updating the frequency.
> 

I think your patch is ok for tackling this problem.

> diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c
> index 0c5c61a095f6..8991cc31b011 100644
> --- a/kernel/sched/cpufreq_schedutil.c
> +++ b/kernel/sched/cpufreq_schedutil.c
> @@ -121,13 +121,8 @@ static void sugov_fast_switch(struct sugov_policy *sg_policy, u64 time,
>  static void sugov_deferred_update(struct sugov_policy *sg_policy, u64 time,
>                                   unsigned int next_freq)
>  {
> -       if (!sugov_update_next_freq(sg_policy, time, next_freq))
> -               return;
> -
> -       if (!sg_policy->work_in_progress) {
> -               sg_policy->work_in_progress = true;
> +       if (sugov_update_next_freq(sg_policy, time, next_freq))
>                 irq_work_queue(&sg_policy->irq_work);
> -       }
>  }
> 
>  /**
> @@ -159,6 +154,15 @@ static unsigned int get_next_freq(struct sugov_policy *sg_policy,
>         unsigned int freq = arch_scale_freq_invariant() ?
>                                 policy->cpuinfo.max_freq : policy->cur;
> 
> +       /*
> +        * The previous frequency update didn't go as we expected it to be. Lets
> +        * start again to make sure we don't miss any updates.
> +        */
> +       if (unlikely(policy->cur != sg_policy->next_freq)) {
> +               sg_policy->next_freq = 0;
> +               sg_policy->cached_raw_freq = 0;
> +       }
> +
>         freq = map_util_freq(util, freq, max);
> 
>         if (freq == sg_policy->cached_raw_freq && !sg_policy->need_freq_update)
> @@ -337,8 +341,14 @@ static void sugov_update_single(struct update_util_data *hook, u64 time,
> 
>         ignore_dl_rate_limit(sg_cpu, sg_policy);
> 
> +       if (!sg_policy->policy->fast_switch_enabled) {
> +               raw_spin_lock(&sg_policy->update_lock);
> +               if (sg_policy->work_in_progress)
> +                       goto unlock;
> +       }
> +

Maybe it's better to bring the following code before the code above.
         if (!sugov_should_update_freq(sg_policy, time))
               return;

>         if (!sugov_should_update_freq(sg_policy, time))
> -               return;
> +               goto unlock;
> 
>         /* Limits may have changed, don't skip frequency update */
>         busy = !sg_policy->need_freq_update && sugov_cpu_is_busy(sg_cpu);
> @@ -363,13 +373,14 @@ static void sugov_update_single(struct update_util_data *hook, u64 time,
>          * concurrently on two different CPUs for the same target and it is not
>          * necessary to acquire the lock in the fast switch case.
>          */
> -       if (sg_policy->policy->fast_switch_enabled) {
> +       if (sg_policy->policy->fast_switch_enabled)
>                 sugov_fast_switch(sg_policy, time, next_f);
> -       } else {
> -               raw_spin_lock(&sg_policy->update_lock);
> +       else
>                 sugov_deferred_update(sg_policy, time, next_f);
> +
> +unlock:
> +       if (!sg_policy->policy->fast_switch_enabled)
>                 raw_spin_unlock(&sg_policy->update_lock);
> -       }
>  }
> 
>  static unsigned int sugov_next_freq_shared(struct sugov_cpu *sg_cpu, u64 time)
> @@ -405,6 +416,9 @@ sugov_update_shared(struct update_util_data *hook, u64 time, unsigned int flags)
> 
>         raw_spin_lock(&sg_policy->update_lock);
> 
> +       if (sg_policy->work_in_progress)
> +               goto unlock;
> +
>         sugov_iowait_boost(sg_cpu, time, flags);
>         sg_cpu->last_update = time;
> 
> @@ -419,33 +433,30 @@ sugov_update_shared(struct update_util_data *hook, u64 time, unsigned int flags)
>                         sugov_deferred_update(sg_policy, time, next_f);
>         }
> 
> +unlock:
>         raw_spin_unlock(&sg_policy->update_lock);
>  }
> 
>  static void sugov_work(struct kthread_work *work)
>  {
>         struct sugov_policy *sg_policy = container_of(work, struct sugov_policy, work);
> -       unsigned int freq;
>         unsigned long flags;
> 
>         /*
> -        * Hold sg_policy->update_lock shortly to handle the case where:
> -        * incase sg_policy->next_freq is read here, and then updated by
> -        * sugov_deferred_update() just before work_in_progress is set to false
> -        * here, we may miss queueing the new update.
> -        *
> -        * Note: If a work was queued after the update_lock is released,
> -        * sugov_work() will just be called again by kthread_work code; and the
> -        * request will be proceed before the sugov thread sleeps.
> +        * Prevent the schedutil hook to run in parallel while we are updating
> +        * the frequency here and accessing next_freq.
>          */
>         raw_spin_lock_irqsave(&sg_policy->update_lock, flags);
> -       freq = sg_policy->next_freq;
> -       sg_policy->work_in_progress = false;
> +       sg_policy->work_in_progress = true;
>         raw_spin_unlock_irqrestore(&sg_policy->update_lock, flags);
> 
>         mutex_lock(&sg_policy->work_lock);
> -       __cpufreq_driver_target(sg_policy->policy, freq, CPUFREQ_RELATION_L);
> +       __cpufreq_driver_target(sg_policy->policy, sg_policy->next_freq, CPUFREQ_RELATION_L);
>         mutex_unlock(&sg_policy->work_lock);
> +
> +       raw_spin_lock_irqsave(&sg_policy->update_lock, flags);
> +       sg_policy->work_in_progress = false;
> +       raw_spin_unlock_irqrestore(&sg_policy->update_lock, flags);
>  }
> 
>  static void sugov_irq_work(struct irq_work *irq_work)
> 
> --
> viresh