linux-kernel - [PATCH] irq/timings: Fix model validity

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20181107094624.GB9828@hirez.programming.kicks-ass.net>
Date:   Wed, 7 Nov 2018 10:46:24 +0100
From:   Peter Zijlstra <peterz@...radead.org>
To:     "Rafael J. Wysocki" <rafael@...nel.org>
Cc:     "Rafael J. Wysocki" <rjw@...ysocki.net>,
        Linux PM <linux-pm@...r.kernel.org>,
        Giovanni Gherdovich <ggherdovich@...e.cz>,
        Doug Smythies <dsmythies@...us.net>,
        Srinivas Pandruvada <srinivas.pandruvada@...ux.intel.com>,
        Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
        Frederic Weisbecker <frederic@...nel.org>,
        Mel Gorman <mgorman@...e.de>,
        Daniel Lezcano <daniel.lezcano@...aro.org>,
        Nicolas Pitre <nicolas.pitre@...aro.org>
Subject: [PATCH] irq/timings: Fix model validity

On Wed, Nov 07, 2018 at 09:59:36AM +0100, Peter Zijlstra wrote:
> On Wed, Nov 07, 2018 at 12:39:31AM +0100, Rafael J. Wysocki wrote:

> > In general, however, I need to be convinced that interrupts that
> > didn't wake up the CPU from idle are relevant for next wakeup
> > prediction.  I see that this may be the case, but to what extent is
> > rather unclear to me and it looks like calling
> > irq_timings_next_event() would add considerable overhead.
> 
> How about we add a (debug) knob so that people can play with it for now?
> If it turns out to be useful, we'll learn.

That said; Daniel, I think there is a problem with how irqs_update()
sets irqs->valid. We seem to set valid even when we're still training.

---
Subject: irq/timings: Fix model validity

The per IRQ timing predictor will produce a 'valid' prediction even if
the model is still training. This should not happen.

Fix this by moving the actual training (online stddev algorithm) up a
bit and returning early (before predicting) when we've not yet reached
the sample threshold.

A direct concequence is that the predictor will only ever run with at
least that many samples, which means we can remove one branch.

Signed-off-by: Peter Zijlstra (Intel) <peterz@...radead.org>
---
 kernel/irq/timings.c | 66 +++++++++++++++++++++++++++++-----------------------
 1 file changed, 37 insertions(+), 29 deletions(-)

diff --git a/kernel/irq/timings.c b/kernel/irq/timings.c
index 1e4cb63a5c82..5d22fd5facd5 100644
--- a/kernel/irq/timings.c
+++ b/kernel/irq/timings.c
@@ -28,6 +28,13 @@ struct irqt_stat {
 	int	valid;
 };
 
+/*
+ * The rule of thumb in statistics for the normal distribution
+ * is having at least 30 samples in order to have the model to
+ * apply.
+ */
+#define SAMPLE_THRESHOLD	30
+
 static DEFINE_IDR(irqt_stats);
 
 void irq_timings_enable(void)
@@ -101,7 +108,6 @@ void irq_timings_disable(void)
  * distribution appears when the number of samples is 30 (it is the
  * rule of thumb in statistics, cf. "30 samples" on Internet). When
  * there are three consecutive anomalies, the statistics are resetted.
- *
  */
 static void irqs_update(struct irqt_stat *irqs, u64 ts)
 {
@@ -146,11 +152,38 @@ static void irqs_update(struct irqt_stat *irqs, u64 ts)
 	 */
 	diff = interval - irqs->avg;
 
+	/*
+	 * Online average algorithm:
+	 *
+	 *  new_average = average + ((value - average) / count)
+	 *
+	 * The variance computation depends on the new average
+	 * to be computed here first.
+	 *
+	 */
+	irqs->avg = irqs->avg + (diff >> IRQ_TIMINGS_SHIFT);
+
+	/*
+	 * Online variance algorithm:
+	 *
+	 *  new_variance = variance + (value - average) x (value - new_average)
+	 *
+	 * Warning: irqs->avg is updated with the line above, hence
+	 * 'interval - irqs->avg' is no longer equal to 'diff'
+	 */
+	irqs->variance = irqs->variance + (diff * (interval - irqs->avg));
+
 	/*
 	 * Increment the number of samples.
 	 */
 	irqs->nr_samples++;
 
+	/*
+	 * If we're still training the model, we can't make any predictions yet.
+	 */
+	if (irqs->nr_samples < SAMPLE_THRESHOLD)
+		return;
+
 	/*
 	 * Online variance divided by the number of elements if there
 	 * is more than one sample.  Normally the formula is division
@@ -158,16 +191,12 @@ static void irqs_update(struct irqt_stat *irqs, u64 ts)
 	 * more than 32 and dividing by 32 instead of 31 is enough
 	 * precise.
 	 */
-	if (likely(irqs->nr_samples > 1))
-		variance = irqs->variance >> IRQ_TIMINGS_SHIFT;
+	variance = irqs->variance >> IRQ_TIMINGS_SHIFT;
 
 	/*
-	 * The rule of thumb in statistics for the normal distribution
-	 * is having at least 30 samples in order to have the model to
-	 * apply. Values outside the interval are considered as an
-	 * anomaly.
+	 * Values outside the interval are considered as an anomaly.
 	 */
-	if ((irqs->nr_samples >= 30) && ((diff * diff) > (9 * variance))) {
+	if ((diff * diff) > (9 * variance)) {
 		/*
 		 * After three consecutive anomalies, we reset the
 		 * stats as it is no longer stable enough.
@@ -191,27 +220,6 @@ static void irqs_update(struct irqt_stat *irqs, u64 ts)
 	 */
 	irqs->valid = 1;
 
-	/*
-	 * Online average algorithm:
-	 *
-	 *  new_average = average + ((value - average) / count)
-	 *
-	 * The variance computation depends on the new average
-	 * to be computed here first.
-	 *
-	 */
-	irqs->avg = irqs->avg + (diff >> IRQ_TIMINGS_SHIFT);
-
-	/*
-	 * Online variance algorithm:
-	 *
-	 *  new_variance = variance + (value - average) x (value - new_average)
-	 *
-	 * Warning: irqs->avg is updated with the line above, hence
-	 * 'interval - irqs->avg' is no longer equal to 'diff'
-	 */
-	irqs->variance = irqs->variance + (diff * (interval - irqs->avg));
-
 	/*
 	 * Update the next event
 	 */