linux-kernel - Re: [PATCH v2] schedutil: Allow cpufreq requests to be made even when kthread kicked

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20180522220953.GB40506@joelaf.mtv.corp.google.com>
Date:   Tue, 22 May 2018 15:09:53 -0700
From:   Joel Fernandes <joel@...lfernandes.org>
To:     Viresh Kumar <viresh.kumar@...aro.org>
Cc:     "Joel Fernandes (Google.)" <joelaf@...gle.com>,
        linux-kernel@...r.kernel.org,
        "Rafael J . Wysocki" <rafael.j.wysocki@...el.com>,
        Peter Zijlstra <peterz@...radead.org>,
        Ingo Molnar <mingo@...hat.com>,
        Patrick Bellasi <patrick.bellasi@....com>,
        Juri Lelli <juri.lelli@...hat.com>,
        Luca Abeni <luca.abeni@...tannapisa.it>,
        Todd Kjos <tkjos@...gle.com>, claudio@...dence.eu.com,
        kernel-team@...roid.com, linux-pm@...r.kernel.org
Subject: Re: [PATCH v2] schedutil: Allow cpufreq requests to be made even
 when kthread kicked

On Tue, May 22, 2018 at 04:04:15PM +0530, Viresh Kumar wrote:
> Okay, me and Rafael were discussing this patch, locking and races around this.
> 
> On 18-05-18, 11:55, Joel Fernandes (Google.) wrote:
> > diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c
> > index e13df951aca7..5c482ec38610 100644
> > --- a/kernel/sched/cpufreq_schedutil.c
> > +++ b/kernel/sched/cpufreq_schedutil.c
> > @@ -92,9 +92,6 @@ static bool sugov_should_update_freq(struct sugov_policy *sg_policy, u64 time)
> >  	    !cpufreq_can_do_remote_dvfs(sg_policy->policy))
> >  		return false;
> >  
> > -	if (sg_policy->work_in_progress)
> > -		return false;
> > -
> >  	if (unlikely(sg_policy->need_freq_update)) {
> >  		sg_policy->need_freq_update = false;
> >  		/*
> > @@ -128,7 +125,7 @@ static void sugov_update_commit(struct sugov_policy *sg_policy, u64 time,
> >  
> >  		policy->cur = next_freq;
> >  		trace_cpu_frequency(next_freq, smp_processor_id());
> > -	} else {
> > +	} else if (!sg_policy->work_in_progress) {
> >  		sg_policy->work_in_progress = true;
> >  		irq_work_queue(&sg_policy->irq_work);
> >  	}
> > @@ -291,6 +288,13 @@ static void sugov_update_single(struct update_util_data *hook, u64 time,
> >  
> >  	ignore_dl_rate_limit(sg_cpu, sg_policy);
> >  
> > +	/*
> > +	 * For slow-switch systems, single policy requests can't run at the
> > +	 * moment if update is in progress, unless we acquire update_lock.
> > +	 */
> > +	if (sg_policy->work_in_progress)
> > +		return;
> > +
> >  	if (!sugov_should_update_freq(sg_policy, time))
> >  		return;
> >  
> > @@ -382,13 +386,27 @@ sugov_update_shared(struct update_util_data *hook, u64 time, unsigned int flags)
> >  static void sugov_work(struct kthread_work *work)
> >  {
> >  	struct sugov_policy *sg_policy = container_of(work, struct sugov_policy, work);
> > +	unsigned int freq;
> > +	unsigned long flags;
> > +
> > +	/*
> > +	 * Hold sg_policy->update_lock shortly to handle the case where:
> > +	 * incase sg_policy->next_freq is read here, and then updated by
> > +	 * sugov_update_shared just before work_in_progress is set to false
> > +	 * here, we may miss queueing the new update.
> > +	 *
> > +	 * Note: If a work was queued after the update_lock is released,
> > +	 * sugov_work will just be called again by kthread_work code; and the
> > +	 * request will be proceed before the sugov thread sleeps.
> > +	 */
> > +	raw_spin_lock_irqsave(&sg_policy->update_lock, flags);
> > +	freq = sg_policy->next_freq;
> > +	sg_policy->work_in_progress = false;
> > +	raw_spin_unlock_irqrestore(&sg_policy->update_lock, flags);
> >  
> >  	mutex_lock(&sg_policy->work_lock);
> > -	__cpufreq_driver_target(sg_policy->policy, sg_policy->next_freq,
> > -				CPUFREQ_RELATION_L);
> > +	__cpufreq_driver_target(sg_policy->policy, freq, CPUFREQ_RELATION_L);
> >  	mutex_unlock(&sg_policy->work_lock);
> > -
> > -	sg_policy->work_in_progress = false;
> >  }
> 
> And I do see a race here for single policy systems doing slow switching.
> 
> Kthread                                                 Sched update
> 
> sugov_work()                                            sugov_update_single()
> 
>         lock();
>         // The CPU is free to rearrange below           
>         // two in any order, so it may clear
>         // the flag first and then read next
>         // freq. Lets assume it does.
>         work_in_progress = false
> 
>                                                         if (work_in_progress)
>                                                                 return;
> 
>                                                         sg_policy->next_freq = 0;
>         freq = sg_policy->next_freq;
>                                                         sg_policy->next_freq = real-next-freq;
>         unlock();
> 

I agree with the race you describe for single policy slow-switch. Good find :)

The mainline sugov_work could also do such reordering in sugov_work, I think. Even
with the mutex_unlock in mainline's sugov_work, that work_in_progress write could
be reordered by the CPU to happen before the read of next_freq. AIUI,
mutex_unlock is expected to be only a release-barrier.

Although to be safe, I could just put an smp_mb() there. I believe with that,
no locking would be needed for such case.

I'll send out a v3 with Acks for the original patch, and the send out the
smp_mb() as a separate patch if that's Ok.

thanks,

 - Joel