linux-kernel - Re: Scheduler regression: Too frequent timer interrupts(?)

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-Id: <1239985426.23397.4757.camel@laptop>
Date:	Fri, 17 Apr 2009 18:23:46 +0200
From:	Peter Zijlstra <peterz@...radead.org>
To:	Christoph Lameter <cl@...ux.com>
Cc:	Ingo Molnar <mingo@...e.hu>, linux-kernel@...r.kernel.org
Subject: Re: Scheduler regression: Too frequent timer interrupts(?)

On Fri, 2009-04-17 at 11:55 -0400, Christoph Lameter wrote:

> Futher details are included in the document that I pointed him to. The
> "workload" is a synthetic test of a busy loop continually reading the TSC
> register.
> 
> http://www.kernel.org/pub/linux/kernel/people/christoph/collab-spring-2009/Latencharts.ods

You're really going to make me open that thing eh..

As we've constituted, graph 1 is useless.

I'm still not quite sure why you couldn't provide the data for the other
graphs in email. They are not at all that much:

Graph 2: Noise Length

Kernel          Test 1  Test 2  Test 3  Interruption(AVG)
2.6.22          2.55    2.61    1.92    2.36
2.6.23          1.33    1.38    1.34    1.35
2.6.24          1.97    1.86    1.87    1.90
2.6.25          2.09    2.29    2.09    2.16
2.6.26          1.49    1.22    1.22    1.31
2.6.27          1.67    1.28    1.18    1.38
2.6.28          1.27    1.21    1.14    1.21
2.6.29          1.44    1.33    1.54    1.44
2.6.30-rc2      2.06    1.49    1.24    1.60

Is pretty useless too, since it only counts >1us events. Hence it will
always be biased.

Much better would have been a graph constructed from a histrogram that
plots all lengths (say in 10ns) buckets. There you would have had a few
(at least 2) peaks, one around the time it takes to complete the
userspace loop, and one around the time it takes to do this interrupt
thing -- and maybe some other smaller ones. But now we're clueless.

Combined with graph 1, we can only compare 26+, and there we can see
there is some variance in how long a tick takes between kernels -- but a
std-dev along with this avg, would have been even better.

For the past few releases I cannot remember anything that would
immediately affect the tick length. Nothing structural was changed
around that code.

> > Is the overhead 1%? 2%? 0.5%? And how did it change from 2.6.22
> > onwards? Did it go up by 0.1%, from 1% to 1.1%? Or did the average
> > go down by 0.05%, while increasing the spread of events (thus
> > fooling your cutoff)?
> 
> As you see in the diagrams provided there is a 4 fold increase in the
> number of interrupts >1usecs when going from 2.6.22 to 2.6.23. How would
> you measure the overhead? Time spent in the OS? Disturbance of the caches
> by the OS that cause the application to have to refetch data from Ram?

You could for example run an NMI profiler at 10000 Hz and collect
samples. Or use PMU hardware to collect numbers

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/