linux-kernel - Re: [PATCH v7 00/11] track CPU utilization

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20180705123617.GM2458@hirez.programming.kicks-ass.net>
Date:   Thu, 5 Jul 2018 14:36:17 +0200
From:   Peter Zijlstra <peterz@...radead.org>
To:     Vincent Guittot <vincent.guittot@...aro.org>
Cc:     mingo@...nel.org, linux-kernel@...r.kernel.org, rjw@...ysocki.net,
        juri.lelli@...hat.com, dietmar.eggemann@....com,
        Morten.Rasmussen@....com, viresh.kumar@...aro.org,
        valentin.schneider@....com, patrick.bellasi@....com,
        joel@...lfernandes.org, daniel.lezcano@...aro.org,
        quentin.perret@....com, luca.abeni@...tannapisa.it,
        claudio@...dence.eu.com
Subject: Re: [PATCH v7 00/11] track CPU utilization

On Thu, Jun 28, 2018 at 05:45:03PM +0200, Vincent Guittot wrote:
> Vincent Guittot (11):
>   sched/pelt: Move pelt related code in a dedicated file
>   sched/rt: add rt_rq utilization tracking
>   cpufreq/schedutil: use rt utilization tracking
>   sched/dl: add dl_rq utilization tracking
>   cpufreq/schedutil: use dl utilization tracking
>   sched/irq: add irq utilization tracking
>   cpufreq/schedutil: take into account interrupt
>   sched: schedutil: remove sugov_aggregate_util()
>   sched: use pelt for scale_rt_capacity()
>   sched: remove rt_avg code
>   proc/sched: remove unused sched_time_avg_ms
> 
>  include/linux/sched/sysctl.h     |   1 -
>  kernel/sched/Makefile            |   2 +-
>  kernel/sched/core.c              |  38 +---
>  kernel/sched/cpufreq_schedutil.c |  65 ++++---
>  kernel/sched/deadline.c          |   8 +-
>  kernel/sched/fair.c              | 403 +++++----------------------------------
>  kernel/sched/pelt.c              | 399 ++++++++++++++++++++++++++++++++++++++
>  kernel/sched/pelt.h              |  72 +++++++
>  kernel/sched/rt.c                |  15 +-
>  kernel/sched/sched.h             |  68 +++++--
>  kernel/sysctl.c                  |   8 -
>  11 files changed, 632 insertions(+), 447 deletions(-)
>  create mode 100644 kernel/sched/pelt.c
>  create mode 100644 kernel/sched/pelt.h

OK, this looks good I suppose. Rafael, are you OK with me taking these?

I have the below on top because I once again forgot how it all worked;
does this work for you Vincent?

---
Subject: sched/cpufreq: Clarify sugov_get_util()

Add a few comments (hopefully) clarifying some of the magic in
sugov_get_util().

Signed-off-by: Peter Zijlstra (Intel) <peterz@...radead.org>
---
 cpufreq_schedutil.c |   69 ++++++++++++++++++++++++++++++++++++++--------------
 1 file changed, 51 insertions(+), 18 deletions(-)

--- a/kernel/sched/cpufreq_schedutil.c
+++ b/kernel/sched/cpufreq_schedutil.c
@@ -177,6 +177,26 @@ static unsigned int get_next_freq(struct
 	return cpufreq_driver_resolve_freq(policy, freq);
 }
 
+/*
+ * This function computes an effective utilization for the given CPU, to be
+ * used for frequency selection given the linear relation: f = u * f_max.
+ *
+ * The scheduler tracks the following metrics:
+ *
+ *   cpu_util_{cfs,rt,dl,irq}()
+ *   cpu_bw_dl()
+ *
+ * Where the cfs,rt and dl util numbers are tracked with the same metric and
+ * synchronized windows and are thus directly comparable.
+ *
+ * The cfs,rt,dl utilization are the running times measured with rq->clock_task
+ * which excludes things like IRQ and steal-time. These latter are then accrued in
+ * the irq utilization.
+ *
+ * The DL bandwidth number otoh is not a measured meric but a value computed
+ * based on the task model parameters and gives the minimal u required to meet
+ * deadlines.
+ */
 static unsigned long sugov_get_util(struct sugov_cpu *sg_cpu)
 {
 	struct rq *rq = cpu_rq(sg_cpu->cpu);
@@ -188,26 +208,50 @@ static unsigned long sugov_get_util(stru
 	if (rt_rq_is_runnable(&rq->rt))
 		return max;
 
+	/*
+	 * Early check to see if IRQ/steal time saturates the CPU, can be
+	 * because of inaccuracies in how we track these -- see
+	 * update_irq_load_avg().
+	 */
 	irq = cpu_util_irq(rq);
-
 	if (unlikely(irq >= max))
 		return max;
 
-	/* Sum rq utilization */
+	/*
+	 * Because the time spend on RT/DL tasks is visible as 'lost' time to
+	 * CFS tasks and we use the same metric to track the effective
+	 * utilization (PELT windows are synchronized) we can directly add them
+	 * to obtain the CPU's actual utilization.
+	 */
 	util = cpu_util_cfs(rq);
 	util += cpu_util_rt(rq);
 
 	/*
-	 * Interrupt time is not seen by rqs utilization nso we can compare
-	 * them with the CPU capacity
+	 * We do not make cpu_util_dl() a permanent part of this sum because we
+	 * want to use cpu_bw_dl() later on, but we need to check if the
+	 * CFS+RT+DL sum is saturated (ie. no idle time) such that we select
+	 * f_max when there is no idle time.
+	 *
+	 * NOTE: numerical errors or stop class might cause us to not quite hit
+	 * saturation when we should -- something for later.
 	 */
 	if ((util + cpu_util_dl(rq)) >= max)
 		return max;
 
 	/*
-	 * As there is still idle time on the CPU, we need to compute the
-	 * utilization level of the CPU.
+	 * There is still idle time; further improve the number by using the
+	 * irq metric. Because IRQ/steal time is hidden from the task clock we
+	 * need to scale the task numbers:
 	 *
+	 *              1 - irq
+	 *   U' = irq + ------- * U
+	 *                max
+	 */
+	util *= (max - irq);
+	util /= max;
+	util += irq;
+
+	/*
 	 * Bandwidth required by DEADLINE must always be granted while, for
 	 * FAIR and RT, we use blocked utilization of IDLE CPUs as a mechanism
 	 * to gracefully reduce the frequency when no tasks show up for longer
@@ -217,18 +261,7 @@ static unsigned long sugov_get_util(stru
 	 * util_cfs + util_dl as requested freq. However, cpufreq is not yet
 	 * ready for such an interface. So, we only do the latter for now.
 	 */
-
-	/* Weight rqs utilization to normal context window */
-	util *= (max - irq);
-	util /= max;
-
-	/* Add interrupt utilization */
-	util += irq;
-
-	/* Add DL bandwidth requirement */
-	util += sg_cpu->bw_dl;
-
-	return min(max, util);
+	return min(max, util + sg_cpu->bw_dl);
 }
 
 /**