linux-kernel - Re: sched-freq locking

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <87egdc2o8k.fsf@e105922-lin.cambridge.arm.com>
Date:	Wed, 20 Jan 2016 16:46:03 +0000
From:	Punit Agrawal <punit.agrawal@....com>
To:	Juri Lelli <Juri.Lelli@....com>
Cc:	Steve Muckle <steve.muckle@...aro.org>,
	Michael Turquette <mturquette@...libre.com>,
	Vincent Guittot <vincent.guittot@...aro.org>,
	Patrick Bellasi <patrick.bellasi@....com>,
	Morten Rasmussen <morten.rasmussen@....com>,
	Dietmar Eggemann <dietmar.eggemann@....com>,
	Viresh Kumar <viresh.kumar@...aro.org>,
	"linux-kernel\@vger.kernel.org" <linux-kernel@...r.kernel.org>,
	"linux-pm\@vger.kernel.org" <linux-pm@...r.kernel.org>,
	Peter Zijlstra <peterz@...radead.org>,
	"Rafael J. Wysocki" <rjw@...ysocki.net>,
	Javi Merino <Javi.Merino@....com>
Subject: Re: sched-freq locking

Juri Lelli <Juri.Lelli@....com> writes:

> On 20/01/16 06:50, Steve Muckle wrote:
>> On 01/20/2016 04:18 AM, Juri Lelli wrote:
>> > I fear that caching could break thermal. If everybody was already using
>> > sched-freq interface to control frequency this won't probably be a
>> > problem, but we are not yet there :(. So, IIUC by caching policy->max,
>> > for example, we might affect what thermal expects from cpufreq.
>> 
>> I think we could probably just access policy->max without rwsem, as long
>> as we ensure policy is valid. Cpufreq will take care to cap a frequency
>> request to an upper limit put in by thermal anyway so I don't think it's
>> a problematic race. But I haven't spent much time thinking about it.
>> 
>
> Mmm, can't the fact that the scheduler might think it can still request
> a frequency above max be problematic?
>
> Anyway, thermal seems to use cpufreq_set_policy() for changing max (if I
> got it right Punit :)), so we might be able to intercept such sites and
> do proper update of cached values.


That's correct. The cpu cooling device calls cpufreq_update_policy which
subsequently calls cpufreq_set_policy.

The max itself is changed via the policy notifier callback in the
cooling device.

>
>> >
>> ...
>> >> Done (added linux-pm, PeterZ and Rafael as well).
>> >>
>> > 
>> > This discussion is pretty interesting, yes. I'm a bit afraid people
>> > bumped into this might have troubles understanding context, though.
>> > And I'm not sure how to give them that context; maybe start a new thread
>> > summarizing what has been discussed so far?
>> 
>> Yeah, that occurred to me as well. I wasn't sure whether to restart the
>> thread or put in the locking I had in mind and just send it with the
>> next schedfreq RFC series, which should be in the next few weeks. I was
>> going to do the latter but here is a recap of the topic from my side:
>> 
>
> Thanks a lot for writing this down!
>
>> Currently schedfreq has to go through two stages of aggregation. The
>> first is aggregating the capacity request votes from the different sched
>> classes within a CPU. The second is aggregating the capacity request
>> votes from the CPUs within a frequency domain. I'm looking to solve the
>> locking issues with both of these stages as well as those with calling
>> into cpufreq in the fast path.
>> 
>> For the first stage, currently we assume that the rq lock is held. This
>> is the case for the existing calls into schedfreq (the load balancer
>> required a minor change). There are a few more hooks that need to go
>> into migration paths in core.c which will be slightly more painful, but
>> they are IMO doable.
>> 
>> For the second stage I believe an internal schedfreq spinlock is
>> required, for one thing to protect against competing updates within the
>> same frequency domain from different CPUs, but also so that we can
>> guarantee the cpufreq policy is valid, which can be done if the
>> governor_start/stop callbacks take the spinlock as well.
>> 
>
> Does this need to nest within the rq lock?
>
>> As for accessing various things in cpufreq->policy and trying to take
>> rwsem in the fast path, we should be able to either cache some of the
>> items in the governor_start() path, eliminate the references, or access
>> them without locking rwsem (as long as we know the policy is valid,
>
> If we only need to guarantee existence of the policy, without any direct
> updates, RCU might be a good fit.
>
>> which we do by taking the spinlock in governor_start()). Here are the
>> things we currently reference in the schedfreq set capacity hook and my
>> thoughts on each of them:
>> 
>> policy->governor: this is currently tested to see if schedfreq is
>> enabled, but this can be tracked internally to schedfreq and set in the
>> governor_start/stop callbacks
>> 
>> policy->governor_data: used to access schedfreq's data for the policy,
>> could be changed to an internal schedfreq percpu pointer
>> 
>> policy->cpus, policy->freq_table: these can be cached internally to
>> schedfreq on governor_start/stop
>> 
>> policy->max: as mentioned above I *think* we could get away with
>> accessing this without locking rwsem as discussed above
>> 
>> policy->transition_ongoing: this isn't strictly required as schedfreq
>> could track internally whether it has a transition outstanding, but we
>> should also be okay to look at this without policy->rwsem since
>> schedfreq is the only one queuing transitions, again assuming we take
>> care to ensure policy is valid as above
>> 
>> Finally when actually requesting a frequency change in the fast path, we
>> can trylock policy->rwsem and fall back to the slow path (kthread) if
>> that fails.
>> 
>
> OTOH, all the needed aggregation could make the fast path not so fast in
> the end. So, maybe having a fast and a slow path is always good?
>
> Thanks,
>
> - Juri
> --
> To unsubscribe from this list: send the line "unsubscribe linux-pm" in
> the body of a message to majordomo@...r.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html