[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1393681393.982.10.camel@wall-e.seibold.net>
Date: Sat, 01 Mar 2014 14:43:13 +0100
From: Stefani Seibold <stefani@...bold.net>
To: Andy Lutomirski <luto@...capital.net>
Cc: "H. Peter Anvin" <hpa@...or.com>, X86 ML <x86@...nel.org>,
Greg KH <gregkh@...uxfoundation.org>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
Thomas Gleixner <tglx@...utronix.de>,
Ingo Molnar <mingo@...hat.com>,
Andi Kleen <ak@...ux.intel.com>,
Andrea Arcangeli <aarcange@...hat.com>,
John Stultz <john.stultz@...aro.org>,
Pavel Emelyanov <xemul@...allels.com>,
Cyrill Gorcunov <gorcunov@...nvz.org>,
andriy.shevchenko@...ux.intel.com, Martin.Runge@...de-schwarz.com,
Andreas.Brief@...de-schwarz.com
Subject: Re: [PATCH v2 1/4] x86: Use the default ABI for the 32-bit vDSO
Am Freitag, den 28.02.2014, 12:19 -0800 schrieb Andy Lutomirski:
> On Fri, Feb 28, 2014 at 7:06 AM, H. Peter Anvin <hpa@...or.com> wrote:
> > How many internal function calls are there? It seems we should try to avoid those as much as possible by suitable inlining.
>
> There are no non-static calls at all, except for __x86.get_pc_thunk.
> I imagine that gcc is smart enough to improve the calling convention
> to non-externally-visible functions.
>
> Amazingly (to me, anyway), the performance of the 32-bit version seems
> to be within 1 ns or so of the 64-bit version on SNB. I suspect that
> Intel has optimized the crap out of these things.
>
I did some benchmarks on my Core2 Q9300 / 2.53GHz and against
"-mregparm=3 -freg-struct-return" and "-mregparm=0".
The system was boot with idle=poll, the scaling_governor was set to
performance, sched_rt_runtime_us was set to 1000000 and and the
benchmark was executed under realtime priority 99.
For gettimeday() and time() there is no difference, gettimeofday() has
an average runtime of 49 ns and time() needs 11 ns. In the default ABI
is a little bit faster measured in sub-nanoseconds.
For the clock_gettime(CLOCK_MONOTONIC) the results are 47 ns as best
cast for the non default ABI and 46 for the default ABI. In the average
it was more than 1 ns faster.
So the default ABI is faster, in any cases.
One interesting thing is that the HPET code is significant faster when
using kernel parameter idle=poll, it is 953 vs 46 ns, this a factor of
more than 20.
- Stefani
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists