lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Sat, 01 Mar 2014 14:43:13 +0100
From:	Stefani Seibold <stefani@...bold.net>
To:	Andy Lutomirski <luto@...capital.net>
Cc:	"H. Peter Anvin" <hpa@...or.com>, X86 ML <x86@...nel.org>,
	Greg KH <gregkh@...uxfoundation.org>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
	Thomas Gleixner <tglx@...utronix.de>,
	Ingo Molnar <mingo@...hat.com>,
	Andi Kleen <ak@...ux.intel.com>,
	Andrea Arcangeli <aarcange@...hat.com>,
	John Stultz <john.stultz@...aro.org>,
	Pavel Emelyanov <xemul@...allels.com>,
	Cyrill Gorcunov <gorcunov@...nvz.org>,
	andriy.shevchenko@...ux.intel.com, Martin.Runge@...de-schwarz.com,
	Andreas.Brief@...de-schwarz.com
Subject: Re: [PATCH v2 1/4] x86: Use the default ABI for the 32-bit vDSO


Am Freitag, den 28.02.2014, 12:19 -0800 schrieb Andy Lutomirski:
> On Fri, Feb 28, 2014 at 7:06 AM, H. Peter Anvin <hpa@...or.com> wrote:
> > How many internal function calls are there? It seems we should try to avoid those as much as possible by suitable inlining.
> 
> There are no non-static calls at all, except for __x86.get_pc_thunk.
> I imagine that gcc is smart enough to improve the calling convention
> to non-externally-visible functions.
> 
> Amazingly (to me, anyway), the performance of the 32-bit version seems
> to be within 1 ns or so of the 64-bit version on SNB.  I suspect that
> Intel has optimized the crap out of these things.
> 

I did some benchmarks on my Core2 Q9300 / 2.53GHz and against
"-mregparm=3 -freg-struct-return" and "-mregparm=0". 

The system was boot with idle=poll, the scaling_governor was set to
performance, sched_rt_runtime_us was set to 1000000 and  and the
benchmark was executed under realtime priority 99. 

For gettimeday() and time() there is no difference, gettimeofday()  has
an average runtime of 49 ns and time() needs 11 ns. In the default ABI
is a little bit faster measured in sub-nanoseconds.

For the  clock_gettime(CLOCK_MONOTONIC) the results are 47 ns as best
cast for the non default ABI and 46 for the default ABI. In the average
it was more than 1 ns faster.

So the default ABI is faster, in any cases.

One interesting thing is that the HPET code is significant faster when
using kernel parameter idle=poll, it is 953 vs 46 ns, this a factor of
more than 20.

- Stefani

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ