linux-kernel - [PATCH 2/2] sched/fair: Use IRQ scaling for all sched classes

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-Id: <20220819153320.291720-3-pierre.gondois@arm.com>
Date:   Fri, 19 Aug 2022 17:33:20 +0200
From:   Pierre Gondois <pierre.gondois@....com>
To:     linux-kernel@...r.kernel.org
Cc:     qperret@...gle.com, Pierre Gondois <pierre.gondois@....com>,
        Ingo Molnar <mingo@...hat.com>,
        Peter Zijlstra <peterz@...radead.org>,
        Juri Lelli <juri.lelli@...hat.com>,
        Vincent Guittot <vincent.guittot@...aro.org>,
        Dietmar Eggemann <dietmar.eggemann@....com>,
        Steven Rostedt <rostedt@...dmis.org>,
        Ben Segall <bsegall@...gle.com>, Mel Gorman <mgorman@...e.de>,
        Daniel Bristot de Oliveira <bristot@...hat.com>,
        Valentin Schneider <vschneid@...hat.com>
Subject: [PATCH 2/2] sched/fair: Use IRQ scaling for all sched classes

The time spent executing IRQ handlers is not reflected in the
utilization of CPU. IRQ scaling reduces rq CFS, RT and DL
util by reflecting the CPU capacity reduction due to IRQs.

commit 9033ea11889f ("cpufreq/schedutil: Take time spent in interrupts
into account")
introduced the notion of IRQ scaling for the now called
effective_cpu_util() function with the following expression (for the
CPU util):
  IRQ util_avg + (max_cap - IRQ util_avg / max_cap ) * /Sum rq util_avg

commit 523e979d3164 ("sched/core: Use PELT for scale_rt_capacity()")
introduced IRQ scaling for scale_rt_capacity(), but without scaling
RT and DL rq util.

scale_rt_capacity() excludes RT and DL rq signals from IRQ scaling.
Only the available capacity is scaled. However RT and DL rq util
should also be scaled.

Applying IRQ scaling allows to extract the IRQ util avg. So IRQ util
avg should also be subtracted from the available capacity.
Thermal pressure is not execution time but reduces the maximum
possible capacity of a CPU. So IRQ scaling should not be applied.

Thus, in this order:
 - subtract thermal pressure
 - apply IRQ scaling on the remaining capacity (RT + DL + CFS + free)
 - subtract IRQ util

Also, sort variables in reverse tree order.

Signed-off-by: Pierre Gondois <pierre.gondois@....com>
---
 kernel/sched/fair.c | 19 ++++++++++++-------
 1 file changed, 12 insertions(+), 7 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index bcae7bdd5582..546e490d6753 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -8468,16 +8468,23 @@ static inline void init_sd_lb_stats(struct sd_lb_stats *sds)
 
 static unsigned long scale_rt_capacity(int cpu)
 {
-	struct rq *rq = cpu_rq(cpu);
 	unsigned long max = arch_scale_cpu_capacity(cpu);
+	struct rq *rq = cpu_rq(cpu);
+	unsigned long irq, thermal;
 	unsigned long used, free;
-	unsigned long irq;
 
 	irq = cpu_util_irq(rq);
 
 	if (unlikely(irq >= max))
 		return 1;
 
+	thermal = thermal_load_avg(rq);
+	if (unlikely(thermal >= max))
+		return 1;
+
+	free = max - thermal;
+	free = scale_irq_capacity(free, irq, max);
+
 	/*
 	 * avg_rt.util_avg and avg_dl.util_avg track binary signals
 	 * (running and not running) with weights 0 and 1024 respectively.
@@ -8486,14 +8493,12 @@ static unsigned long scale_rt_capacity(int cpu)
 	 */
 	used = READ_ONCE(rq->avg_rt.util_avg);
 	used += READ_ONCE(rq->avg_dl.util_avg);
-	used += thermal_load_avg(rq);
+	used += irq;
 
-	if (unlikely(used >= max))
+	if (unlikely(used >= free))
 		return 1;
 
-	free = max - used;
-
-	return scale_irq_capacity(free, irq, max);
+	return free - used;
 }
 
 static void update_cpu_capacity(struct sched_domain *sd, int cpu)
-- 
2.25.1