linux-kernel - Re: [RFC][PATCH 0/7] sched: Optimize sched

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <529B7B27.4070801@linux.intel.com>
Date:	Sun, 01 Dec 2013 20:08:39 +0200
From:	Eliezer Tamir <eliezer.tamir@...ux.intel.com>
To:	Peter Zijlstra <peterz@...radead.org>
CC:	John Stultz <john.stultz@...aro.org>,
	Thomas Gleixner <tglx@...utronix.de>,
	Steven Rostedt <rostedt@...dmis.org>,
	Ingo Molnar <mingo@...e.hu>,
	Mathieu Desnoyers <mathieu.desnoyers@...icios.com>,
	Andy Lutomirski <luto@...capital.net>,
	linux-kernel@...r.kernel.org, Tony Luck <tony.luck@...il.com>,
	hpa@...or.com
Subject: Re: [RFC][PATCH 0/7] sched: Optimize sched_clock bits

On 29/11/2013 19:36, Peter Zijlstra wrote:
> Hi all,
> 
> This series is supposed to optimize the kernel/sched/clock.c and x86
> sched_clock() implementations.
> 
> So far its only been boot tested. So no clue if it really makes the thing
> faster, but it does remove the need to disable IRQs.
> 
> I'm hoping Eliezer will test this with his benchmark where he could measure a
> performance regression between using sched_clock() and local_clock().

So I tested and retested, but I'm not sure I understand the results.

The numbers I previously reported were with turbo boost enabled.
Since turbo boost changes the CPU frequency depending on how hot it is,
it has a complicated interaction with busy polling.
In general you see better numbers, but it's harder to tell what's
going on.

With turbo boost disabled in BIOS to try to get a more linear behavior I
see:

3.13.0-rc2 (no pathces) 82.0 KRR/s
with busy poll using local clock 80.2 KRR/s.
Note that there is a big variance between cores and on the SMT sibling
of the core that has the packets steered to I see 81.8 KRR/s. (on the
other tests this core is slightly lower than on the one that accepts the
packets, I'm not sure I can explain this.)
local clock + sched_clock patches 80.6 KRR/s
sched patches (busy poll using sched_clock) 80.6 KRR/s

Maybe I'm doing something wrong?

Perf clearly affects the netperf results but the delta is only a few
percent so the numbers might still be good.
On the other hand, I'm seeing repeated warnings that the perf MNI
handler took too long to run, and I need to reboot to get perf to run
again.

Attached are the perf outputs.

If you can think of any other interesting tests, or anything I'm doing
wrong, I'm open to suggestions.

Thanks,
Eliezer

View attachment "perf.local-clock.txt" of type "text/plain" (21545 bytes)

View attachment "perf.local-clock+patched.txt" of type "text/plain" (22800 bytes)

View attachment "perf.patched.txt" of type "text/plain" (23467 bytes)

View attachment "perf.sched-clock.txt" of type "text/plain" (21534 bytes)