[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <48125635.3060303@zytor.com>
Date: Fri, 25 Apr 2008 15:07:49 -0700
From: "H. Peter Anvin" <hpa@...or.com>
To: Mathieu Desnoyers <mathieu.desnoyers@...ymtl.ca>
CC: Linus Torvalds <torvalds@...ux-foundation.org>,
Andi Kleen <andi@...stfloor.org>, Ingo Molnar <mingo@...e.hu>,
Jiri Slaby <jirislaby@...il.com>,
David Miller <davem@...emloft.net>, zdenek.kabelac@...il.com,
rjw@...k.pl, paulmck@...ux.vnet.ibm.com, akpm@...ux-foundation.org,
linux-ext4@...r.kernel.org, herbert@...dor.apana.org.au,
penberg@...helsinki.fi, clameter@....com,
linux-kernel@...r.kernel.org, pageexec@...email.hu,
Jeremy Fitzhardinge <jeremy@...p.org>
Subject: Re: [PATCH 1/1] x86: fix text_poke
Mathieu Desnoyers wrote:
>
> Yes, this is the case. Using breakpoints for markers quickly becomes
> noticeable for thing such as scheduler instrumentation, page fault
> handler instrumentation, etc. And yes, I have developed kernel tracer,
> LTTng, which takes care of writing the data to trace buffers
> efficiently. The last time I took performance measurements, it was
> performing locking and writing to the memory buffer in about 270ns on a
> 3GHz Pentium 4. It might be a tiny bit slower now that it parses the
> markers format strings dynamically, but nothing very significant.
>
> But there is another point that markers do which the breakpoint won't
> give you : they extract local variables from functions and they identify
> them with field names which separates the instrumentation from the
> actual kernel implementation details. In order to do that, I rely on gcc
> building a stack frame for a function call, which I don't want to build
> unnecessarity when the marker is disabled. This is why I use a jump to
> skip passing the arguments on the stack and the function call.
>
Well, debuggers do it, and that's ultimately what why we have debugging
annotation formats like DWARF2 - to be able to take an arbitrary state
and decode local variables from the combined register-memory state.
This is often done by an interpreter, but that's not necessary; a
compiler can use the debugging information and build appropriate capture
code, which would be able to execute very quickly. Not only is this
capable of extracting arbitrary information, but it also guarantees that
the extraction code is out of line.
The act of building a stack frame not only preturbs the generated code
(gcc has to guarantee liveness, which you can see as a pro or a con),
but it also puts a fair amount of code in the icache path of the function.
Now, if a breakpoint is too expensive, one can do exactly the same trick
with a naked call instruction, with a higher icache impact in the unused
case (five bytes instead of one or two). However, the key to low impact
is to use the debugging information to recover state.
(Liveness at the probe point is still possible to enforce with this
technique: give gcc a "g" read constraint as part of the probe
instruction. That makes gcc ensure the information is *somewhere*. The
debugging information will tell you where to pick it up from.
Obviously, any time liveness is enforce you suffer a potential cost.)
-hpa
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists