lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Tue, 13 Dec 2022 17:42:40 +0000
From:   Lukasz Luba <lukasz.luba@....com>
To:     Qais Yousef <qyousef@...alina.io>
Cc:     Ingo Molnar <mingo@...nel.org>,
        Peter Zijlstra <peterz@...radead.org>,
        Dietmar Eggemann <dietmar.eggemann@....com>,
        "Rafael J. Wysocki" <rafael@...nel.org>,
        Viresh Kumar <viresh.kumar@...aro.org>,
        linux-pm@...r.kernel.org,
        Vincent Guittot <vincent.guittot@...aro.org>,
        linux-kernel@...r.kernel.org, Wei Wang <wvw@...gle.com>,
        Xuewen Yan <xuewen.yan94@...il.com>,
        Hank <han.lin@...iatek.com>,
        Jonathan JMChen <Jonathan.JMChen@...iatek.com>
Subject: Re: [RFC PATCH 3/3] sched/fair: Traverse cpufreq policies to detect
 capacity inversion

Hi Qais,

I thought I could help with this issue.

On 12/12/22 18:43, Qais Yousef wrote:
> On 12/09/22 17:47, Vincent Guittot wrote:
> 
> [...]
> 
>>>>>> This patch loops on all cpufreq policy in sched softirq, how can this
>>>>>> be sane ? and not only in eas mode but also in the default asymmetric
>>>>>
>>>>> Hmm I'm still puzzled. Why it's not sane to do it here but it's okay to do it
>>>>> in the wake up path in feec()?
>>>>
>>>> feec() should be considered as an exception not as the default rule.
>>>> Thing like above which loops for_each on external subsystem should be
>>>> prevented and the fact that feec loops all PDs doesn't means that we
>>>> can put that everywhere else
>>>
>>> Fair enough. But really understanding the root cause behind this limitation
>>> will be very helpful. I don't have the same appreciation of why this is
>>> a problem, and shedding more light will help me to think more about it in the
>>> future.
>>>
>>
>> Take the example of 1k cores with per cpu policy. Do you really think a
>> for_each_cpufreq_policy would be reasonable ?
> 
> Hmm I don't think such an HMP system makes sense to ever exist.
> 
> That system has to be a multi-socket system and I doubt inversion detection is
> something of value.
> 
> Point taken anyway. Let's find another way to do this.
> 

Another way might be to use the 'update' code path, which sets this
information source, for the thermal pressure. That code path isn't as
hot as this in the task scheduler. Furthermore, we would also
have time and handle properly CPU hotplug callbacks there.

So something like this, I have in mind:

------------------------------8<-----------------------------
diff --git a/drivers/base/arch_topology.c b/drivers/base/arch_topology.c
index e7d6e6657ffa..7f372a93e21b 100644
--- a/drivers/base/arch_topology.c
+++ b/drivers/base/arch_topology.c
@@ -16,6 +16,7 @@
  #include <linux/sched/topology.h>
  #include <linux/cpuset.h>
  #include <linux/cpumask.h>
+#include <linux/mutex.h>
  #include <linux/init.h>
  #include <linux/rcupdate.h>
  #include <linux/sched.h>
@@ -153,6 +154,33 @@ void topology_set_freq_scale(const struct cpumask 
*cpus, unsigned long cur_freq,
  DEFINE_PER_CPU(unsigned long, cpu_scale) = SCHED_CAPACITY_SCALE;
  DEFINE_PER_CPU(unsigned long, cpu_scale) = SCHED_CAPACITY_SCALE;
  EXPORT_PER_CPU_SYMBOL_GPL(cpu_scale);

+static struct cpumask highest_capacity_mask;

+static struct cpumask highest_capacity_mask;
+static unsigned int max_possible_capacity;
+static DEFINE_MUTEX(max_capacity_lock);
+
+static void max_capacity_update(const struct cpumask *cpus,
+                               unsigned long capacity)
+{
+       mutex_lock(&max_capacity_lock);
+
+       if (max_possible_capacity < capacity) {
+               max_possible_capacity = capacity;
+
+               cpumask_clear(&highest_capacity_mask);
+
+               cpumask_or(&highest_capacity_mask,
+                          &highest_capacity_mask, cpus);
+       }
+
+       mutex_unlock(&max_capacity_lock);
+}
+
+bool topology_test_max_cpu_capacity(unsigned int cpu)
+{
+       return cpumask_test_cpu(cpu, &highest_capacity_mask);
+}
+EXPORT_SYMBOL_GPL(topology_test_max_cpu_capacity);
+
  void topology_set_cpu_scale(unsigned int cpu, unsigned long capacity)
  {
         per_cpu(cpu_scale, cpu) = capacity;
@@ -203,6 +231,8 @@ void topology_update_thermal_pressure(const struct 
cpumask *cpus,

         for_each_cpu(cpu, cpus)
                 WRITE_ONCE(per_cpu(thermal_pressure, cpu), th_pressure);
+
+       max_capacity_update(cpus, capacity);
  }
  EXPORT_SYMBOL_GPL(topology_update_thermal_pressure);


--------------------------->8--------------------------------

We could use the RCU if there is a potential to read racy date
while the updater modifies the mask in the meantime. Mutex is to
serialize the thermal writers which might be kicked for two
policies at the same time.

If you like I can develop and test such code in the arch_topology.c

Regards,
Lukasz

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ