linux-kernel - Re: [PATCH v2] schedutil: Allow cpufreq requests to be made even when kthread kicked

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAJZ5v0gEop-1Xp81fTuNQ+cxPwfYx-_1MMLyJSR3piCT8ifhFg@mail.gmail.com>
Date:   Wed, 23 May 2018 10:18:25 +0200
From:   "Rafael J. Wysocki" <rafael@...nel.org>
To:     Joel Fernandes <joel@...lfernandes.org>
Cc:     Viresh Kumar <viresh.kumar@...aro.org>,
        "Joel Fernandes (Google.)" <joelaf@...gle.com>,
        Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
        "Rafael J . Wysocki" <rafael.j.wysocki@...el.com>,
        Peter Zijlstra <peterz@...radead.org>,
        Ingo Molnar <mingo@...hat.com>,
        Patrick Bellasi <patrick.bellasi@....com>,
        Juri Lelli <juri.lelli@...hat.com>,
        Luca Abeni <luca.abeni@...tannapisa.it>,
        Todd Kjos <tkjos@...gle.com>,
        Claudio Scordino <claudio@...dence.eu.com>,
        kernel-team@...roid.com, Linux PM <linux-pm@...r.kernel.org>
Subject: Re: [PATCH v2] schedutil: Allow cpufreq requests to be made even when
 kthread kicked

On Wed, May 23, 2018 at 12:09 AM, Joel Fernandes <joel@...lfernandes.org> wrote:
> On Tue, May 22, 2018 at 04:04:15PM +0530, Viresh Kumar wrote:
>> Okay, me and Rafael were discussing this patch, locking and races around this.
>>
>> On 18-05-18, 11:55, Joel Fernandes (Google.) wrote:
>> > diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c
>> > index e13df951aca7..5c482ec38610 100644
>> > --- a/kernel/sched/cpufreq_schedutil.c
>> > +++ b/kernel/sched/cpufreq_schedutil.c
>> > @@ -92,9 +92,6 @@ static bool sugov_should_update_freq(struct sugov_policy *sg_policy, u64 time)
>> >         !cpufreq_can_do_remote_dvfs(sg_policy->policy))
>> >             return false;
>> >
>> > -   if (sg_policy->work_in_progress)
>> > -           return false;
>> > -
>> >     if (unlikely(sg_policy->need_freq_update)) {
>> >             sg_policy->need_freq_update = false;
>> >             /*
>> > @@ -128,7 +125,7 @@ static void sugov_update_commit(struct sugov_policy *sg_policy, u64 time,
>> >
>> >             policy->cur = next_freq;
>> >             trace_cpu_frequency(next_freq, smp_processor_id());
>> > -   } else {
>> > +   } else if (!sg_policy->work_in_progress) {
>> >             sg_policy->work_in_progress = true;
>> >             irq_work_queue(&sg_policy->irq_work);
>> >     }
>> > @@ -291,6 +288,13 @@ static void sugov_update_single(struct update_util_data *hook, u64 time,
>> >
>> >     ignore_dl_rate_limit(sg_cpu, sg_policy);
>> >
>> > +   /*
>> > +    * For slow-switch systems, single policy requests can't run at the
>> > +    * moment if update is in progress, unless we acquire update_lock.
>> > +    */
>> > +   if (sg_policy->work_in_progress)
>> > +           return;
>> > +
>> >     if (!sugov_should_update_freq(sg_policy, time))
>> >             return;
>> >
>> > @@ -382,13 +386,27 @@ sugov_update_shared(struct update_util_data *hook, u64 time, unsigned int flags)
>> >  static void sugov_work(struct kthread_work *work)
>> >  {
>> >     struct sugov_policy *sg_policy = container_of(work, struct sugov_policy, work);
>> > +   unsigned int freq;
>> > +   unsigned long flags;
>> > +
>> > +   /*
>> > +    * Hold sg_policy->update_lock shortly to handle the case where:
>> > +    * incase sg_policy->next_freq is read here, and then updated by
>> > +    * sugov_update_shared just before work_in_progress is set to false
>> > +    * here, we may miss queueing the new update.
>> > +    *
>> > +    * Note: If a work was queued after the update_lock is released,
>> > +    * sugov_work will just be called again by kthread_work code; and the
>> > +    * request will be proceed before the sugov thread sleeps.
>> > +    */
>> > +   raw_spin_lock_irqsave(&sg_policy->update_lock, flags);
>> > +   freq = sg_policy->next_freq;
>> > +   sg_policy->work_in_progress = false;
>> > +   raw_spin_unlock_irqrestore(&sg_policy->update_lock, flags);
>> >
>> >     mutex_lock(&sg_policy->work_lock);
>> > -   __cpufreq_driver_target(sg_policy->policy, sg_policy->next_freq,
>> > -                           CPUFREQ_RELATION_L);
>> > +   __cpufreq_driver_target(sg_policy->policy, freq, CPUFREQ_RELATION_L);
>> >     mutex_unlock(&sg_policy->work_lock);
>> > -
>> > -   sg_policy->work_in_progress = false;
>> >  }
>>
>> And I do see a race here for single policy systems doing slow switching.
>>
>> Kthread                                                 Sched update
>>
>> sugov_work()                                            sugov_update_single()
>>
>>         lock();
>>         // The CPU is free to rearrange below
>>         // two in any order, so it may clear
>>         // the flag first and then read next
>>         // freq. Lets assume it does.
>>         work_in_progress = false
>>
>>                                                         if (work_in_progress)
>>                                                                 return;
>>
>>                                                         sg_policy->next_freq = 0;
>>         freq = sg_policy->next_freq;
>>                                                         sg_policy->next_freq = real-next-freq;
>>         unlock();
>>
>
> I agree with the race you describe for single policy slow-switch. Good find :)
>
> The mainline sugov_work could also do such reordering in sugov_work, I think. Even
> with the mutex_unlock in mainline's sugov_work, that work_in_progress write could
> be reordered by the CPU to happen before the read of next_freq. AIUI,
> mutex_unlock is expected to be only a release-barrier.
>
> Although to be safe, I could just put an smp_mb() there. I believe with that,
> no locking would be needed for such case.

Yes, but leaving the work_in_progress check in sugov_update_single()
means that the original problem is still there in the one-CPU policy
case.  Namely, utilization updates coming in between setting
work_in_progress in sugov_update_commit() and clearing it in
sugov_work() will be discarded in the one-CPU policy case, but not in
the shared policy case.

> I'll send out a v3 with Acks for the original patch,

OK

> and the send out the smp_mb() as a separate patch if that's Ok.

I would prefer to use a spinlock in the one-CPU policy non-fast-switch
case and remove the work_in_progress check from sugov_update_single().

I can do a patch on top of yours for that.  In fact, I've done that already. :-)

Thanks,
Rafael