[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <alpine.DEB.1.10.0808151535300.24788@gandalf.stny.rr.com>
Date: Fri, 15 Aug 2008 17:34:37 -0400 (EDT)
From: Steven Rostedt <rostedt@...dmis.org>
To: Andi Kleen <andi@...stfloor.org>
cc: Mathieu Desnoyers <mathieu.desnoyers@...ymtl.ca>,
Linus Torvalds <torvalds@...ux-foundation.org>,
Jeremy Fitzhardinge <jeremy@...p.org>,
LKML <linux-kernel@...r.kernel.org>, Ingo Molnar <mingo@...e.hu>,
Thomas Gleixner <tglx@...utronix.de>,
Peter Zijlstra <peterz@...radead.org>,
Andrew Morton <akpm@...ux-foundation.org>,
David Miller <davem@...emloft.net>,
Roland McGrath <roland@...hat.com>,
Ulrich Drepper <drepper@...hat.com>,
Rusty Russell <rusty@...tcorp.com.au>,
Gregory Haskins <ghaskins@...ell.com>,
Arnaldo Carvalho de Melo <acme@...hat.com>,
"Luis Claudio R. Goncalves" <lclaudio@...g.org>,
Clark Williams <williams@...hat.com>
Subject: Re: Efficient x86 and x86_64 NOP microbenchmarks
[ Finally got my goodmis email back ]
On Wed, 13 Aug 2008, Andi Kleen wrote:
> > Sorry to ask, I feel I must be missing something, but I'm trying to
> > figure out where you propose to add the "call mcount" ? In the caller or
> > in the callee ?
>
> callee like gcc. caller would be likely more bloated because
> there are more calls than functions. Also if it was at the
> callee more code would be needed because the function currently
> executed couldn't be gotten from stack directly.
>
> > Or is it a different scheme I don't see ? I am trying to figure out how
> > you happen to do all that without dynamic code modification and manage
> > not to hurt performance.
>
> The dynamic code modification is only needed because there is no
> global table of the mcount call sites. So instead it discovers
> them at runtime, but that requires runtime save patching
The new code does not discover the places at runtime. The old code did
that. The "to kill a daemon" removed the runtime discovery and replaced it
with discovery at compile time.
>
> With a custom call scheme one could just build up a table of
> call sites at link time using an ELF section and then when
> tracing is enabled/disabled always patch them all in one go
> in a stop_machine(). Then you wouldn't need parallel execution safe
> patching anymore and it doesn't matter what the nops look like.
The current patch set, pretty much does exactly this. Yes, I patch
at boot up all in one go, before the other CPUS are even active.
This takes all of 6 milliseconds to do. Not much extra time for bootup.
>
> The other advantage is that it would allow getting rid of
> the frame pointer.
This is the only advantage that you have.
-- Steve
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists