[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20221220115130.lhhakj36kn3opqtz@airbuntu>
Date: Tue, 20 Dec 2022 11:51:30 +0000
From: Qais Yousef <qyousef@...alina.io>
To: Lukasz Luba <lukasz.luba@....com>
Cc: Ingo Molnar <mingo@...nel.org>,
Peter Zijlstra <peterz@...radead.org>,
Dietmar Eggemann <dietmar.eggemann@....com>,
"Rafael J. Wysocki" <rafael@...nel.org>,
Viresh Kumar <viresh.kumar@...aro.org>,
linux-pm@...r.kernel.org,
Vincent Guittot <vincent.guittot@...aro.org>,
linux-kernel@...r.kernel.org, Wei Wang <wvw@...gle.com>,
Xuewen Yan <xuewen.yan94@...il.com>,
Hank <han.lin@...iatek.com>,
Jonathan JMChen <Jonathan.JMChen@...iatek.com>
Subject: Re: [RFC PATCH 3/3] sched/fair: Traverse cpufreq policies to detect
capacity inversion
On 12/13/22 17:42, Lukasz Luba wrote:
> Hi Qais,
>
> I thought I could help with this issue.
Thanks Lukasz!
>
> On 12/12/22 18:43, Qais Yousef wrote:
> > On 12/09/22 17:47, Vincent Guittot wrote:
> >
> > [...]
> >
> > > > > > > This patch loops on all cpufreq policy in sched softirq, how can this
> > > > > > > be sane ? and not only in eas mode but also in the default asymmetric
> > > > > >
> > > > > > Hmm I'm still puzzled. Why it's not sane to do it here but it's okay to do it
> > > > > > in the wake up path in feec()?
> > > > >
> > > > > feec() should be considered as an exception not as the default rule.
> > > > > Thing like above which loops for_each on external subsystem should be
> > > > > prevented and the fact that feec loops all PDs doesn't means that we
> > > > > can put that everywhere else
> > > >
> > > > Fair enough. But really understanding the root cause behind this limitation
> > > > will be very helpful. I don't have the same appreciation of why this is
> > > > a problem, and shedding more light will help me to think more about it in the
> > > > future.
> > > >
> > >
> > > Take the example of 1k cores with per cpu policy. Do you really think a
> > > for_each_cpufreq_policy would be reasonable ?
> >
> > Hmm I don't think such an HMP system makes sense to ever exist.
> >
> > That system has to be a multi-socket system and I doubt inversion detection is
> > something of value.
> >
> > Point taken anyway. Let's find another way to do this.
> >
>
> Another way might be to use the 'update' code path, which sets this
> information source, for the thermal pressure. That code path isn't as
> hot as this in the task scheduler. Furthermore, we would also
> have time and handle properly CPU hotplug callbacks there.
>
> So something like this, I have in mind:
>
> ------------------------------8<-----------------------------
> diff --git a/drivers/base/arch_topology.c b/drivers/base/arch_topology.c
> index e7d6e6657ffa..7f372a93e21b 100644
> --- a/drivers/base/arch_topology.c
> +++ b/drivers/base/arch_topology.c
> @@ -16,6 +16,7 @@
> #include <linux/sched/topology.h>
> #include <linux/cpuset.h>
> #include <linux/cpumask.h>
> +#include <linux/mutex.h>
> #include <linux/init.h>
> #include <linux/rcupdate.h>
> #include <linux/sched.h>
> @@ -153,6 +154,33 @@ void topology_set_freq_scale(const struct cpumask
> *cpus, unsigned long cur_freq,
> DEFINE_PER_CPU(unsigned long, cpu_scale) = SCHED_CAPACITY_SCALE;
> DEFINE_PER_CPU(unsigned long, cpu_scale) = SCHED_CAPACITY_SCALE;
> EXPORT_PER_CPU_SYMBOL_GPL(cpu_scale);
>
> +static struct cpumask highest_capacity_mask;
>
> +static struct cpumask highest_capacity_mask;
> +static unsigned int max_possible_capacity;
> +static DEFINE_MUTEX(max_capacity_lock);
> +
> +static void max_capacity_update(const struct cpumask *cpus,
> + unsigned long capacity)
> +{
> + mutex_lock(&max_capacity_lock);
> +
> + if (max_possible_capacity < capacity) {
> + max_possible_capacity = capacity;
> +
> + cpumask_clear(&highest_capacity_mask);
> +
> + cpumask_or(&highest_capacity_mask,
> + &highest_capacity_mask, cpus);
> + }
> +
> + mutex_unlock(&max_capacity_lock);
> +}
> +
> +bool topology_test_max_cpu_capacity(unsigned int cpu)
> +{
> + return cpumask_test_cpu(cpu, &highest_capacity_mask);
> +}
> +EXPORT_SYMBOL_GPL(topology_test_max_cpu_capacity);
> +
> void topology_set_cpu_scale(unsigned int cpu, unsigned long capacity)
> {
> per_cpu(cpu_scale, cpu) = capacity;
> @@ -203,6 +231,8 @@ void topology_update_thermal_pressure(const struct
> cpumask *cpus,
>
> for_each_cpu(cpu, cpus)
> WRITE_ONCE(per_cpu(thermal_pressure, cpu), th_pressure);
> +
> + max_capacity_update(cpus, capacity);
> }
> EXPORT_SYMBOL_GPL(topology_update_thermal_pressure);
>
>
> --------------------------->8--------------------------------
>
> We could use the RCU if there is a potential to read racy date
> while the updater modifies the mask in the meantime. Mutex is to
> serialize the thermal writers which might be kicked for two
> policies at the same time.
>
> If you like I can develop and test such code in the arch_topology.c
As we discussed offline, Vincent is keen on decoupling the util_fits_cpu()
logic from HMP - which means I need to reword this differently.
Let's keep this in the back burner in case we need to revisit it again.
Appreciate the proposal!!
Many thanks
--
Qais Yousef
Powered by blists - more mailing lists