[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20220705215948.GK5208@pengutronix.de>
Date: Tue, 5 Jul 2022 23:59:48 +0200
From: Sascha Hauer <sha@...gutronix.de>
To: Steven Rostedt <rostedt@...dmis.org>
Cc: linux-kernel@...r.kernel.org, linux-arm-kernel@...ts.infradead.org,
Ingo Molnar <mingo@...hat.com>, kernel@...gutronix.de
Subject: Re: Performance impact of CONFIG_FUNCTION_TRACER
On Tue, Jul 05, 2022 at 10:39:01AM -0400, Steven Rostedt wrote:
> On Tue, 5 Jul 2022 12:54:16 +0200
> Sascha Hauer <sha@...gutronix.de> wrote:
>
> > Hi,
> >
> > I ran some lmbench subtests on a ARMv7 machine (NXP i.MX6q) with and
> > without CONFIG_FUNCTION_TRACER enabled (with CONFIG_DYNAMIC_FTRACE
> > enabled and no tracing active), see below. The Kconfig help text of this
> > option reads as:
> >
> > > If it's runtime disabled (the bootup default), then the overhead of
> > > the instructions is very small and not measurable even in
> > > micro-benchmarks.
>
> Well, this is true for x86 ;-)
That was my assumption ;)
>
> >
> > In my tests the overhead is small, but it surely exists and is
> > measurable at least on ARMv7 machines. Is this expected? Should the help
> > text be rephrased a little less optimistic?
>
> You mean "(but may vary by architecture)"
Something like that, yes.
>
> As I believe due to using a link register for function calls, ARM
> requires adding two 4 byte nops to every function where as x86 only
> adds a single 5 byte nop.
>
> Although nops are very fast (they should not be processed in the CPU's
> pipe line, but I don't know if that's true for every arch). It also
> affects instruction cache misses, as adding 8 bytes around the code
> will cause more cache misses than when they do not exist.
Just digged around a bit and saw that on ARM it's not even a real nop.
The compiler emits:
push {lr}
bl 8010e7c0 <__gnu_mcount_nc>
Which is then turned into a nop by replacing the second instruction with
add sp, sp, #4
to bring the stack pointer back to its original value. This indeed must
be processed by the CPU pipeline. I wonder if that could be optimized by
replacing both instructions with a nop. I have no idea though if that's
feasible at all or if the overhead would even get smaller by that.
>
> Also, there's some configurations that use the old mcount that does add
> some more code to handle the mcount case.
>
> So if this is just to have us change the kconfig, I'm happy to do that.
Yes, would be good to make the kconfig text clear. The overhead itself
is fine when people know that's the price to pay for getting the
function tracer.
Sascha
--
Pengutronix e.K. | |
Steuerwalder Str. 21 | http://www.pengutronix.de/ |
31137 Hildesheim, Germany | Phone: +49-5121-206917-0 |
Amtsgericht Hildesheim, HRA 2686 | Fax: +49-5121-206917-5555 |
Powered by blists - more mailing lists