[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <YhgQIGfrkOB7OYWw@hirez.programming.kicks-ass.net>
Date: Fri, 25 Feb 2022 00:09:20 +0100
From: Peter Zijlstra <peterz@...radead.org>
To: Steven Rostedt <rostedt@...dmis.org>
Cc: x86@...nel.org, joao@...rdrivepizza.com, hjl.tools@...il.com,
jpoimboe@...hat.com, andrew.cooper3@...rix.com,
linux-kernel@...r.kernel.org, ndesaulniers@...gle.com,
keescook@...omium.org, samitolvanen@...gle.com,
mark.rutland@....com, alyssa.milburn@...el.com, mbenes@...e.cz,
mhiramat@...nel.org, alexei.starovoitov@...il.com
Subject: Re: [PATCH v2 14/39] x86/ibt,ftrace: Make function-graph play nice
On Thu, Feb 24, 2022 at 10:42:30AM -0500, Steven Rostedt wrote:
> On Thu, 24 Feb 2022 16:36:57 +0100
> Peter Zijlstra <peterz@...radead.org> wrote:
>
> > On Thu, Feb 24, 2022 at 03:51:52PM +0100, Peter Zijlstra wrote:
> > > @@ -316,10 +317,12 @@ SYM_FUNC_START(return_to_handler)
> > >
> > > call ftrace_return_to_handler
> > >
> > > - movq %rax, %rdi
> > > + movq %rax, 16(%rsp)
> > > movq 8(%rsp), %rdx
> > > movq (%rsp), %rax
> > > - addq $24, %rsp
> > > - JMP_NOSPEC rdi
> > > +
> > > + addq $16, %rsp
> > > + UNWIND_HINT_FUNC
> > > + RET
> > > SYM_FUNC_END(return_to_handler)
> > > #endif
> >
> > While talking about this with Mark, an alternative solution is something
> > like this, that would keep the RSB balanced and only mess up the current
> > return.
> >
> > No idea it if makes an appreciatable difference on current hardware,
> > therefore I went with the simpler option above.
> >
> > @@ -307,7 +315,7 @@ EXPORT_SYMBOL(__fentry__)
> >
> > #ifdef CONFIG_FUNCTION_GRAPH_TRACER
> > SYM_FUNC_START(return_to_handler)
> > - subq $24, %rsp
> > + subq $16, %rsp
> >
> > /* Save the return values */
> > movq %rax, (%rsp)
> > @@ -319,7 +327,13 @@ SYM_FUNC_START(return_to_handler)
> > movq %rax, %rdi
> > movq 8(%rsp), %rdx
> > movq (%rsp), %rax
> > - addq $24, %rsp
> > - JMP_NOSPEC rdi
> > +
> > + addq $16, %rsp
> > + ANNOTATE_INTRA_FUNCTION_CALL
> > + call .Ldo_rop
> > +.Ldo_rop:
>
> What's the overhead of an added call (for every function call that is being
> traced)?
Who knows :-) That's all u-arch magic and needs testing (on lots of
hardware). I suspect the dominating cost of all this code is the RSB
miss, not a few regular instructions.
So with this alternative we'll get one guaranteed RSB miss per
construction, but at least the RSB should be mostly good again
afterwards.
With the patch as proposed, the RSB is basically scrap due to
unbalanced.
The original patch changing it to an indirect call cited <2% performance
improvement IIRC, and that all was pre-speculation mess. This
alternative isn't much different from a retpoline (all that's missing is
the speculation trap).
Powered by blists - more mailing lists