[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAJZ5v0h_uoja87NUZb-tWmG_6Fb+1bATB03VRO0Foi5nFadiVQ@mail.gmail.com>
Date: Mon, 5 Dec 2022 13:39:52 +0100
From: "Rafael J. Wysocki" <rafael@...nel.org>
To: Qais Yousef <qyousef@...alina.io>
Cc: "Rafael J. Wysocki" <rafael@...nel.org>,
Ingo Molnar <mingo@...nel.org>,
Peter Zijlstra <peterz@...radead.org>,
Vincent Guittot <vincent.guittot@...aro.org>,
Dietmar Eggemann <dietmar.eggemann@....com>,
Viresh Kumar <viresh.kumar@...aro.org>,
linux-pm@...r.kernel.org, linux-kernel@...r.kernel.org,
Lukasz Luba <lukasz.luba@....com>, Wei Wang <wvw@...gle.com>,
Xuewen Yan <xuewen.yan94@...il.com>,
Hank <han.lin@...iatek.com>,
Jonathan JMChen <Jonathan.JMChen@...iatek.com>
Subject: Re: [RFC PATCH 3/3] sched/fair: Traverse cpufreq policies to detect
capacity inversion
On Sat, Dec 3, 2022 at 3:32 PM Qais Yousef <qyousef@...alina.io> wrote:
>
> On 11/30/22 19:27, Rafael J. Wysocki wrote:
>
> > > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> > > index 7c0dd57e562a..4bbbca85134b 100644
> > > --- a/kernel/sched/fair.c
> > > +++ b/kernel/sched/fair.c
> > > @@ -8856,23 +8856,20 @@ static void update_cpu_capacity(struct sched_domain *sd, int cpu)
> > > * * Thermal pressure will impact all cpus in this perf domain
> > > * equally.
> > > */
> > > - if (sched_energy_enabled()) {
> > > + if (static_branch_unlikely(&sched_asym_cpucapacity)) {
> > > unsigned long inv_cap = capacity_orig - thermal_load_avg(rq);
> > > - struct perf_domain *pd = rcu_dereference(rq->rd->pd);
> > > + struct cpufreq_policy *policy, __maybe_unused *policy_n;
> > >
> > > rq->cpu_capacity_inverted = 0;
> > >
> > > - SCHED_WARN_ON(!rcu_read_lock_held());
> > > -
> > > - for (; pd; pd = pd->next) {
> > > - struct cpumask *pd_span = perf_domain_span(pd);
> > > + for_each_active_policy_safe(policy, policy_n) {
> >
> > 1. Is the "safe" part sufficient for protection against concurrent
> > deletion and freeing of list entries? cpufreq driver removal can do
> > that AFAICS.
>
> The freeing part is not safe probably.
Well, I don't even think that the traversal part is safe against
concurrent removal of list entries (it is safe against removal of list
entries in the loop itself).
list_for_each_entry_safe() assumes that n will always point to a valid
list entry, but what if the entry pointed to by it is deleted by a
concurrent thread?
> I need to research this more. Do you
> have issues against the exportation of this traversal in principle?
>
> Switching them to be RCU protected could be the best safe option, anything
> against that too?
Not really, it just occurred to me that you may end up dealing with a
corrupted list here.
> I might not end up needing that. I need to dig more.
>
> > 2. For a casual reader of this code it may not be clear why cpufreq
> > policies matter here.
>
> I'm looking for a way to traverse the list of capacities of the system and
> know their related CPUs.
So why don't you mention this in a comment, for instance?
> AFAICT this information already exists in the performance domains and
> cpufreq_policy. Performance domains are conditional to energy model and
> schedutil. So trying to switch to cpufreq_policy.
>
> Assuming this question wasn't a request to add a comment :-)
Yes, it was. :-)
That said though, this change makes the scheduler kind of depend on
cpufreq which feels a bit like a corner cut TBH.
I do realize that cpufreq happens to be maintaining a data structure
that turns out to be useful here, but the reason why it is there has
nothing to do with this code AFAICS.
Powered by blists - more mailing lists