lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <529B7B27.4070801@linux.intel.com>
Date:	Sun, 01 Dec 2013 20:08:39 +0200
From:	Eliezer Tamir <eliezer.tamir@...ux.intel.com>
To:	Peter Zijlstra <peterz@...radead.org>
CC:	John Stultz <john.stultz@...aro.org>,
	Thomas Gleixner <tglx@...utronix.de>,
	Steven Rostedt <rostedt@...dmis.org>,
	Ingo Molnar <mingo@...e.hu>,
	Mathieu Desnoyers <mathieu.desnoyers@...icios.com>,
	Andy Lutomirski <luto@...capital.net>,
	linux-kernel@...r.kernel.org, Tony Luck <tony.luck@...il.com>,
	hpa@...or.com
Subject: Re: [RFC][PATCH 0/7] sched: Optimize sched_clock bits

On 29/11/2013 19:36, Peter Zijlstra wrote:
> Hi all,
> 
> This series is supposed to optimize the kernel/sched/clock.c and x86
> sched_clock() implementations.
> 
> So far its only been boot tested. So no clue if it really makes the thing
> faster, but it does remove the need to disable IRQs.
> 
> I'm hoping Eliezer will test this with his benchmark where he could measure a
> performance regression between using sched_clock() and local_clock().

So I tested and retested, but I'm not sure I understand the results.

The numbers I previously reported were with turbo boost enabled.
Since turbo boost changes the CPU frequency depending on how hot it is,
it has a complicated interaction with busy polling.
In general you see better numbers, but it's harder to tell what's
going on.

With turbo boost disabled in BIOS to try to get a more linear behavior I
see:

3.13.0-rc2 (no pathces) 82.0 KRR/s
with busy poll using local clock 80.2 KRR/s.
Note that there is a big variance between cores and on the SMT sibling
of the core that has the packets steered to I see 81.8 KRR/s. (on the
other tests this core is slightly lower than on the one that accepts the
packets, I'm not sure I can explain this.)
local clock + sched_clock patches 80.6 KRR/s
sched patches (busy poll using sched_clock) 80.6 KRR/s

Maybe I'm doing something wrong?

Perf clearly affects the netperf results but the delta is only a few
percent so the numbers might still be good.
On the other hand, I'm seeing repeated warnings that the perf MNI
handler took too long to run, and I need to reboot to get perf to run
again.

Attached are the perf outputs.

If you can think of any other interesting tests, or anything I'm doing
wrong, I'm open to suggestions.

Thanks,
Eliezer

View attachment "perf.local-clock.txt" of type "text/plain" (21545 bytes)

View attachment "perf.local-clock+patched.txt" of type "text/plain" (22800 bytes)

View attachment "perf.patched.txt" of type "text/plain" (23467 bytes)

View attachment "perf.sched-clock.txt" of type "text/plain" (21534 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ