linux-kernel - Re: [RFC PATCH 2/2] sched: idle: IRQ based next prediction for idle period

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <568E8782.9020401@linaro.org>
Date:	Thu, 7 Jan 2016 16:42:58 +0100
From:	Daniel Lezcano <daniel.lezcano@...aro.org>
To:	Nicolas Pitre <nicolas.pitre@...aro.org>
Cc:	tglx@...utronix.de, peterz@...radead.org, rafael@...nel.org,
	linux-pm@...r.kernel.org, linux-kernel@...r.kernel.org,
	vincent.guittot@...aro.org
Subject: Re: [RFC PATCH 2/2] sched: idle: IRQ based next prediction for idle
 period

On 01/06/2016 06:40 PM, Nicolas Pitre wrote:
> On Wed, 6 Jan 2016, Daniel Lezcano wrote:
>
>> Many IRQs are quiet most of the time, or they tend to come in bursts of
>> fairly equal time intervals within each burst. It is therefore possible
>> to detect those IRQs with stable intervals and guestimate when the next
>> IRQ event is most likely to happen.
>>
>> Examples of such IRQs may include audio related IRQs where the FIFO size
>> and/or DMA descriptor size with the sample rate create stable intervals,
>> block devices during large data transfers, etc.  Even network streaming
>> of multimedia content creates patterns of periodic network interface IRQs
>> in some cases.
>>
>> This patch adds code to track the mean interval and variance for each IRQ
>> over a window of time intervals between IRQ events. Those statistics can
>> be used to assist cpuidle in selecting the most appropriate sleep state
>> by predicting the most likely time for the next interrupt.
>>
>> Because the stats are gathered in interrupt context, the core computation
>> is as light as possible.
>>
>> Signed-off-by: Daniel Lezcano <daniel.lezcano@...aro.org>
>
> There are still a few problems with this patch.
>
> Please see comments below.

[ ... ]

>> +/**
>> + * stats_variance - compute the variance
>> + *
>> + * @s: the statistic structure
>> + *
>> + * Returns an u64 corresponding to the variance, or zero if there is
>> + * no data
>> + */
>> +static u64 stats_variance(struct stats *s, u32 mean)
>> +{
>> +	int i;
>> +	u64 variance = 0;
>> +
>> +	/*
>> +	 * The variance is the sum of the squared difference to the
>> +	 * average divided by the number of elements.
>> +	 */
>> +	for (i = 0; i < STATS_NR_VALUES; i++) {
>> +		s32 diff = s->values[i] - mean;
>> +		variance += (u64)diff * diff;
>
> Strictly speaking, diff should be casted to s64 as it is a signed value
> that may actually be negative.  Because of the strange C type promotion
> rules, the generated code appears correct (at least on ARM), but it
> would be clearer to use s64 anyway.

I don't get the connection in your explanation of why it should be a 
s64. It is already a signed s32, s->values[] are s32 and mean is u32.

What would be the benefit to convert diff to s64 ?

> The product will end up being positive in all cases of course so
> variance may remain as a u64.
>

[ ... ]

>> +static ktime_t next_irq_event(void)
>> +{
>> +	unsigned int irq, cpu = raw_smp_processor_id();
>> +	ktime_t diff, next, min = { .tv64 = KTIME_MAX };
>
> To avoid exposing the ktime_t definition, you should use
> ktime_set(KTIME_SEC_MAX, 0) instead of { .tv64 = KTIME_MAX }.

Ok.

>> +	struct wakeup *w;
>> +	u32 interval, mean;
>> +	u64 variance;
>> +
>> +	/*
>> +	 * Lookup the interrupt array for this cpu and search for the
>> +	 * earlier expected interruption.
>> +	 */
>> +        for (irq = 0; irq < NR_IRQS; irq++) {
>> +
>> +		ktime_t now = ktime_get();
>
> ktime_get() is potentially non-trivial. It should be plenty good enough
> to call it only once outside the loop.

Ok.

>> +		w = per_cpu(wakeups[irq], cpu);
>> +
>> +		/*
>> +		 * The interrupt was not setup as a source of a wakeup
>> +		 * or the wakeup source is not considered at this
>> +		 * moment stable enough to do a prediction.
>> +		 */
>> +		if (!w)
>> +			continue;
>> +
>> +		/*
>> +		 * No statistics available yet.
>> +		 */
>> +		if (ktime_equal(w->timestamp, ktime_set(0, 0)))
>> +			continue;
>> +
>> +		diff = ktime_sub(now, w->timestamp);
>> +
>> +		/*
>> +		 * There is no point attempting predictions on interrupts more
>> +		 * than 1 second apart. This has no benefit for sleep state
>> +		 * selection and increases the risk of overflowing our variance
>> +		 * computation. Reset all stats in that case.
>> +		 */
>> +		if (unlikely(ktime_after(diff, ktime_set(1, 0)))) {
>> +			stats_reset(&w->stats);
>> +			continue;
>> +		}
>
> The above is wrong. It is not computing the interval between successive
> interruts but rather the interval between the last interrupt occurrence
> and the present time (i.e. when we're about to go idle).  This won't
> prevent interrupt intervals greater than one second from being summed
> and potentially overflowing the variance if this code is executed less
> than a second after one such IRQ interval.  This test should rather be
> performed in sched_idle_irq().

Ok, I will move it.

>> +		 * If the mean value is null, just ignore this wakeup
>> +		 * source.
>> +		 */
>> +		mean = stats_mean(&w->stats);
>> +		if (!mean)
>> +			continue;
>> +
>> +		variance = stats_variance(&w->stats, mean);
>> +		/*
>> +		 * We want to check the last interval is:
>> +		 *
>> +		 *  mean - stddev < interval < mean + stddev
>> +		 *
>> +		 * That simplifies to:
>> +		 *
>> +		 * -stddev < interval - mean < stddev
>> +		 *
>> +		 * abs(interval - mean) < stddev
>> +		 *
>> +		 * The standard deviation is the sqrt of the variance:
>> +		 *
>> +		 * abs(interval - mean) < sqrt(variance)
>> +		 *
>> +		 * and we want to prevent to do an sqrt, so we square
>> +		 * the equation:
>> +		 *
>> +		 * (interval - mean)^2 < variance
>> +		 *
>> +		 * So if the latest value of the stats complies with
>> +		 * this condition, then the wakeup source is
>> +		 * considered predictable and can be used to predict
>> +		 * the next event.
>> +		 */
>> +		interval = w->stats.values[(w->stats.w_ptr + 1) % STATS_NR_VALUES];
>
> But here (w->stats.w_ptr + 1) % STATS_NR_VALUES does not point at the
> last interval. It rather points at the second oldest.
> To make things simpler, you should rather pre-increment the pointer
> before updating the array in stats_add(), and here the value you want
> will directly be accessible with w->stats.values[w->stats.w_ptr].

Yeah, there is a glitch here. I will look at it.

>> +		if ((u64)((interval - mean) * (interval - mean)) > variance)
>> +			continue;
>
> Same comment as in stats_variance(): this should be s64.
>
>> +		/*
>> +		 * Let's compute the next event: the wakeup source is
>> +		 * considered predictable, we add the average interval
>> +		 * time added to the latest interruption event time.
>> +		 */
>> +		next = ktime_add_us(w->timestamp, stats_mean(&w->stats));
>> +
>> +		/*
>> +		 * If the interrupt is supposed to happen before the
>> +		 * minimum time, then it becomes the minimum.
>> +		 */
>> +		if (ktime_before(next, min))
>> +			min = next;
>> +	}
>> +
>> +	/*
>> +	 * At this point, we have our prediction but the caller is
>> +	 * expecting the remaining time before the next event, so
>> +	 * compute the expected sleep length.
>> +	 */
>> +	diff = ktime_sub(min, ktime_get());
>
> You should use the variable 'now' rather than asking for the current
> time again.

Yep.

>> +	/*
>> +	 * The result could be negative for different reasons:
>> +	 *  - the prediction is incorrect
>> +	 *  - the prediction was too near now and expired while we were
>> +	 *    in this function
>> +	 *
>> +	 * In both cases, we return KTIME_MAX as a failure to do a
>> +	 * prediction
>> +	 */
>> +	if (ktime_compare(diff, ktime_set(0, 0)) <= 0)
>> +		return (ktime_t){ .tv64 = KTIME_MAX };
>
> See my comment about ktime_t internals at the beginning of this
> function.

Ok.

Thanks for the review.

   -- Daniel

-- 
  <http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs

Follow Linaro:  <http://www.facebook.com/pages/Linaro> Facebook |
<http://twitter.com/#!/linaroorg> Twitter |
<http://www.linaro.org/linaro-blog/> Blog

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/