linux-kernel - Re: [PATCH 1/5] [PATCH 1/5] function-graph/x86: replace unbalanced ret with jmp

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Date:	Tue, 13 Oct 2009 17:10:14 -0400
From:	Steven Rostedt <rostedt@...dmis.org>
To:	Mathieu Desnoyers <mathieu.desnoyers@...ymtl.ca>
Cc:	linux-kernel@...r.kernel.org, Ingo Molnar <mingo@...e.hu>,
	Andrew Morton <akpm@...ux-foundation.org>,
	Frederic Weisbecker <fweisbec@...il.com>
Subject: Re: [PATCH 1/5] [PATCH 1/5] function-graph/x86: replace unbalanced
 ret with jmp

On Tue, 2009-10-13 at 16:47 -0400, Mathieu Desnoyers wrote:
> * Steven Rostedt (rostedt@...dmis.org) wrote:
> > From: Steven Rostedt <srostedt@...hat.com>
> > 
> > The function graph tracer replaces the return address with a hook to
> > trace the exit of the function call. This hook will finish by returning
> > to the real location the function should return to.
> > 
> > But the current implementation uses a ret to jump to the real return
> > location. This causes a imbalance between calls and ret. That is
> > the original function does a call, the ret goes to the handler
> > and then the handler does a ret without a matching call.
> > 
> > Although the function graph tracer itself still breaks the branch
> > predictor by replacing the original ret, by using a second ret and
> > causing an imbalance, it breaks the predictor even more.
> > 
> > This patch replaces the ret with a jmp to keep the calls and ret
> > balanced. I tested this on one box and it showed a 1.7% increase in
> > performance. Another box only showed a small 0.3% increase. But no
> > box that I tested this on showed a decrease in performance by making this
> > change.
> 
> This sounds exactly like what I proposed at LPC. I'm glad it shows
> actual improvements.

This is what we discussed at LPC. We both were under the assumption that
a jump would work. The question was how to make that jump without hosing
registers.

We lucked out that this is the back end of the return sequence. Where we
can still clobber callie registers. (just not the ones holding the
return code).

> 
> Just to make sure I understand, the old sequence was:
> 
> call fct
>   call ftrace_entry
>   ret to fct
> ret to ftrace_exit
> ret to caller
> 
> and you now have:
> 
> call fct
>   call ftrace_entry
>   ret to fct
> ret to ftrace_exit
> jmp to caller
> 
> Am I correct ?

Almost.
 
What it was:

call function
  function:
  call mcount
     mcount:
     call ftrace_entry
       ftrace_entry:
       mess up with return code of caller
       ret
     ret

   [function code]

   ret to ftrace_exit
     ftrace_exit:
     get real return
     ret to original

So for the function we have 3 calls and 4 rets

Now we have:

What it was:

call function
  function:
  call mcount
     mcount:
     call ftrace_entry
       ftrace_entry:
       mess up with return code of caller
       ret
     ret

   [function code]

   ret to ftrace_exit
     ftrace_exit:
     get real return
     jmp to original

Now we have 3 calls and 3 rets

Note the first call still does not match the ret, but we don't do two
rets anymore.

-- Steve



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/