lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <1255468214.7113.2396.camel@gandalf.stny.rr.com>
Date:	Tue, 13 Oct 2009 17:10:14 -0400
From:	Steven Rostedt <rostedt@...dmis.org>
To:	Mathieu Desnoyers <mathieu.desnoyers@...ymtl.ca>
Cc:	linux-kernel@...r.kernel.org, Ingo Molnar <mingo@...e.hu>,
	Andrew Morton <akpm@...ux-foundation.org>,
	Frederic Weisbecker <fweisbec@...il.com>
Subject: Re: [PATCH 1/5] [PATCH 1/5] function-graph/x86: replace unbalanced
 ret with jmp

On Tue, 2009-10-13 at 16:47 -0400, Mathieu Desnoyers wrote:
> * Steven Rostedt (rostedt@...dmis.org) wrote:
> > From: Steven Rostedt <srostedt@...hat.com>
> > 
> > The function graph tracer replaces the return address with a hook to
> > trace the exit of the function call. This hook will finish by returning
> > to the real location the function should return to.
> > 
> > But the current implementation uses a ret to jump to the real return
> > location. This causes a imbalance between calls and ret. That is
> > the original function does a call, the ret goes to the handler
> > and then the handler does a ret without a matching call.
> > 
> > Although the function graph tracer itself still breaks the branch
> > predictor by replacing the original ret, by using a second ret and
> > causing an imbalance, it breaks the predictor even more.
> > 
> > This patch replaces the ret with a jmp to keep the calls and ret
> > balanced. I tested this on one box and it showed a 1.7% increase in
> > performance. Another box only showed a small 0.3% increase. But no
> > box that I tested this on showed a decrease in performance by making this
> > change.
> 
> This sounds exactly like what I proposed at LPC. I'm glad it shows
> actual improvements.

This is what we discussed at LPC. We both were under the assumption that
a jump would work. The question was how to make that jump without hosing
registers.

We lucked out that this is the back end of the return sequence. Where we
can still clobber callie registers. (just not the ones holding the
return code).

> 
> Just to make sure I understand, the old sequence was:
> 
> call fct
>   call ftrace_entry
>   ret to fct
> ret to ftrace_exit
> ret to caller
> 
> and you now have:
> 
> call fct
>   call ftrace_entry
>   ret to fct
> ret to ftrace_exit
> jmp to caller
> 
> Am I correct ?

Almost.
 
What it was:

call function
  function:
  call mcount
     mcount:
     call ftrace_entry
       ftrace_entry:
       mess up with return code of caller
       ret
     ret

   [function code]

   ret to ftrace_exit
     ftrace_exit:
     get real return
     ret to original

So for the function we have 3 calls and 4 rets

Now we have:

What it was:

call function
  function:
  call mcount
     mcount:
     call ftrace_entry
       ftrace_entry:
       mess up with return code of caller
       ret
     ret

   [function code]

   ret to ftrace_exit
     ftrace_exit:
     get real return
     jmp to original

Now we have 3 calls and 3 rets

Note the first call still does not match the ret, but we don't do two
rets anymore.

-- Steve



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ