linux-kernel - Re: [PATCH] irq/timings: Fix model validity

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20181107130507.GD9761@hirez.programming.kicks-ass.net>
Date:   Wed, 7 Nov 2018 14:05:07 +0100
From:   Peter Zijlstra <peterz@...radead.org>
To:     Daniel Lezcano <daniel.lezcano@...aro.org>
Cc:     "Rafael J. Wysocki" <rafael@...nel.org>,
        "Rafael J. Wysocki" <rjw@...ysocki.net>,
        Linux PM <linux-pm@...r.kernel.org>,
        Giovanni Gherdovich <ggherdovich@...e.cz>,
        Doug Smythies <dsmythies@...us.net>,
        Srinivas Pandruvada <srinivas.pandruvada@...ux.intel.com>,
        Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
        Frederic Weisbecker <frederic@...nel.org>,
        Mel Gorman <mgorman@...e.de>,
        Nicolas Pitre <nicolas.pitre@...aro.org>
Subject: Re: [PATCH] irq/timings: Fix model validity

On Wed, Nov 07, 2018 at 11:52:31AM +0100, Daniel Lezcano wrote:
> > @@ -146,11 +152,38 @@ static void irqs_update(struct irqt_stat *irqs, u64 ts)
> >  	 */
> >  	diff = interval - irqs->avg;
> >  
> > +	/*
> > +	 * Online average algorithm:
> > +	 *
> > +	 *  new_average = average + ((value - average) / count)
> > +	 *
> > +	 * The variance computation depends on the new average
> > +	 * to be computed here first.
> > +	 *
> > +	 */
> > +	irqs->avg = irqs->avg + (diff >> IRQ_TIMINGS_SHIFT);
> > +
> > +	/*
> > +	 * Online variance algorithm:
> > +	 *
> > +	 *  new_variance = variance + (value - average) x (value - new_average)
> > +	 *
> > +	 * Warning: irqs->avg is updated with the line above, hence
> > +	 * 'interval - irqs->avg' is no longer equal to 'diff'
> > +	 */
> > +	irqs->variance = irqs->variance + (diff * (interval - irqs->avg));
> > +
> >  	/*
> >  	 * Increment the number of samples.
> >  	 */
> >  	irqs->nr_samples++;

FWIW, I'm confused on this. The normal (Welford's) online algorithm
does:

	count++;
	delta = value - mean;
	mean += delta / count;
	M2 += delta * (value - mean);

But the above uses:

	mean += delta / 32;

Which, for count >> 32, over-estimates the mean adjustment. But worse,
it significantly under-estimates the mean during training.

How is the computed variance still correct with this? I can not find any
comments that clarifies this. I'm thinking that since the mean will
slowly wander towards it's actual location (assuming an actual standard
distribution input) the resulting variance will be far too large, since
the (value - mean) term will be much larger than 'expected'.

> > @@ -158,16 +191,12 @@ static void irqs_update(struct irqt_stat *irqs, u64 ts)
> >  	 * more than 32 and dividing by 32 instead of 31 is enough
> >  	 * precise.
> >  	 */
> > +	variance = irqs->variance >> IRQ_TIMINGS_SHIFT;

Worse; variance is actually (as the comment states):

	s^2 = M2 / (count -1)

But instead you compute:

	s^2 = M2 / 32;

Which is again much larger than the actual result; assuming count >> 32.

So you compute a variance that is inflated in two different ways.


I'm not seeing how this thing works reliably.