linux-kernel - Re: [PATCH 0/5] sched/debug: decouple sched_stat tracepoints from CONFIG

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20160629023235.p3fviotuka5hwuzp@treble>
Date:	Tue, 28 Jun 2016 21:32:35 -0500
From:	Josh Poimboeuf <jpoimboe@...hat.com>
To:	Peter Zijlstra <peterz@...radead.org>
Cc:	Ingo Molnar <mingo@...hat.com>, linux-kernel@...r.kernel.org,
	Mel Gorman <mgorman@...hsingularity.net>,
	Matt Fleming <matt@...eblueprint.co.uk>,
	Srikar Dronamraju <srikar@...ux.vnet.ibm.com>
Subject: Re: [PATCH 0/5] sched/debug: decouple sched_stat tracepoints from
 CONFIG_SCHEDSTATS

On Tue, Jun 28, 2016 at 02:43:36PM +0200, Peter Zijlstra wrote:
> On Fri, Jun 17, 2016 at 12:43:22PM -0500, Josh Poimboeuf wrote:
> > NOTE: I didn't include any performance numbers because I wasn't able to
> > get consistent results.  I tried the following on a Xeon E5-2420 v2 CPU:
> > 
> >   $ for i in /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor; do echo -n performance > $i; done
> >   $ echo 1 > /sys/devices/system/cpu/intel_pstate/no_turbo
> >   $ echo 100 > /sys/devices/system/cpu/intel_pstate/min_perf_pct
> >   $ echo 0 > /proc/sys/kernel/nmi_watchdog
> >   $ taskset 0x10 perf stat -n -r10 perf bench sched pipe -l 1000000
> > 
> > I was going to post the numbers from that, both with and without
> > SCHEDSTATS, but then when I tried to repeat the test on a different day,
> > the results were surprisingly different, with different conclusions.
> > 
> > So any advice on measuring scheduler performance would be appreciated...
> 
> Yeah, its a bit of a pain in general...
> 
> A) perf stat --null --repeat 50 -- perf bench sched messaging -g 50 -l 5000 | grep "seconds time elapsed"
> B) perf stat --null --repeat 50 -- taskset 1 perf bench sched pipe | grep "seconds time elapsed"
> 
> 1) tip/master + 1-4
> 2) tip/master + 1-5
> 3) tip/master + 1-5 + below
> 
> 	1		2		3
> 
> A)	4.627767855	4.650429917	4.646208062
> 	4.633921933	4.641424424	4.612021058
> 	4.649536375	4.663144144	4.636815948
> 	4.630165619	4.649053552	4.613022902
> 
> B)	1.770732957	1.789534273	1.773334291
> 	1.761740716	1.795618428	1.773338681
> 	1.763761666	1.822316496	1.774385589
> 
> 
> From this it looks like patch 5 does hurt a wee bit, but we can get most
> of that back by reordering the structure a bit. The results seem
> 'stable' across rebuilds and reboots (I've pop'ed all patches and
> rebuild, rebooted and re-benched 1 at the end and obtained similar
> results).
> 
> Although, possible that if we reorder first and then do 5, we'll just
> see a bigger regression. I've not bothered.

Thanks a lot for benchmarking this!  And also for improving the cache
alignments.  Your changes look good to me.

-- 
Josh