[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <1264726768.4933.50.camel@localhost.localdomain>
Date: Thu, 28 Jan 2010 16:59:28 -0800
From: Jim Keniston <jkenisto@...ibm.com>
To: Ingo Molnar <mingo@...e.hu>
Cc: Peter Zijlstra <peterz@...radead.org>,
Linus Torvalds <torvalds@...ux-foundation.org>,
Tom Tromey <tromey@...hat.com>,
Kyle Moffett <kyle@...fetthome.net>,
"Frank Ch. Eigler" <fche@...hat.com>,
Oleg Nesterov <oleg@...hat.com>,
Andrew Morton <akpm@...ux-foundation.org>,
Stephen Rothwell <sfr@...b.auug.org.au>,
Fr??d??ric Weisbecker <fweisbec@...il.com>,
LKML <linux-kernel@...r.kernel.org>,
Steven Rostedt <rostedt@...dmis.org>,
Arnaldo Carvalho de Melo <acme@...hat.com>,
linux-next@...r.kernel.org, "H. Peter Anvin" <hpa@...or.com>,
utrace-devel@...hat.com, Thomas Gleixner <tglx@...utronix.de>
Subject: Re: linux-next: add utrace tree
On Thu, 2010-01-28 at 09:55 +0100, Ingo Molnar wrote:
> * Jim Keniston <jkenisto@...ibm.com> wrote:
>
> > On Wed, 2010-01-27 at 09:54 +0100, Ingo Molnar wrote:
> > ...
> >
> > Yes, emulating "push %ebp" would buy us a lot of coverage for a lot of apps
> > on x86 (but see below**). [...]
>
...
>
> > [...] Even there, though, we'd have to address the page fault we'd
> > occasionally get when extending the stack vma.
>
> Nope, in the simplest model not even page fault emulation is needed,
> get_user()/put_user() would resolve it automatically. If you either get the
> value with the pagefault resolved, or you get a -EFAULT.
get_user()/put_user() have to be done in a context where you can sleep,
right? Uprobes currently operates in such contexts, but there's some
talk of moving it all to a DIE_INT3 notifier context, where it can't
sleep.
...
>
> > > We could get quite good coverage (and very fast
> > > emulation) for the common case in not too much code - and much of that code
> > > we already have available. No re-trapping,
> >
> > As previously discussed, boosting would also get rid of the single-step trap
> > for most instructions.
>
> Boosting is not in the uprobes patch-set you submitted. Even with it present
> it wont get rid of the initial INT3. So basically _best-case_ (with boosting)
> XOL-uprobes could roughly break even with a pure emulator approach ...
>
> That's a big and fundamental difference.
To be fair, wrt uprobes, emulation and boosting are both in the same
state: pretty well understood, but not yet implemented.
...
> > >
> > > - It's as transparent as it gets - no user-space trampoline or other visible
> > > state that modifies behavior or can be stomped upon by user-space bugs.
> >
> > The XOL vma isn't writable from user space, so I can't think of how it could
> > be clobbered merely by a stray memory reference. [...]
>
> Well there must be some purpose to the instrumentation, there must be some way
> to save data, right? If yes and it's in user-space, that data is clobberable.
One or two others have advocated an approach (which eliminates the
breakpoint trap) where trace data is stored in the uprobe vma, but I
haven't. (In such a case, "XOL vma" would be a misnomer.) I agree that
in such a scenario, the uprobe vma would of necessity be writable by the
app.
>
> If it's in kernel-space then we have to enter the kernel anyway (with similar
> cost patterns to an INT3 entry) - so we just delayed the kernel entry.
This seems to presume that you have to extract trace data from the
kernel every time a probe is hit. In actual practice, you're often just
checking for unusual arg values, incrementing a counter, or some such.
>
...
> > Even if we add emulation, it seems sensible to keep the XOL approach as a
> > backup to handle instructions that aren't yet emulated (and architectures
> > that don't yet have emulators). That way, if you don't probe any unemulated
> > instructions, the XOL vma is never created.
>
> To turn the argument around: an in-kernel emulator is an all-around facility
> to make sure we probe safely and securely, _and_ it is also more portable
> because it's simpler (because more gradual) to implement on a new architecture
> as you dont actually have to copy around instructions (and make sure they work
> in that new place), but have to emulate a limited subset of the instruction
> space, on purely local state.
I understand the desire to start small and simple and grow gradually
from there. We thought we were doing that. Single-stepping out of line
has been in use for close to a decade, maybe more; and boosting (in
kprobes) has been around for a few years as well. To the *probes folks,
it feels pretty solid.
>
...
>
> With an emulator (assuming the emulator is correct) we can execute the precise
> semantics of that instruction in that place - without any side-effects from
> trampolining/replacement.
And of course, our view has been that the best way to achieve the effect
of the instruction, including all desired side-effects, is to execute
the instruction on the CPU.
...
> >
> > **In practice, we've had to probe all sorts of instructions, including FP
> > instructions -- especially where you want to exploit the debug info to get
> > the names, types, and locations of variables and args. For some compilers
> > and architectures, the debug info isn't reliable until the end of the
> > function prologue, at which point you could find any old instruction. Ditto
> > if you want to probe statements within a function.
>
> For those cases, frankly, the right approach is to fix the debug info (or
> introduce a new one) and forget the old crap.
>
> You treat debuginfo as some god-given property, while it's one of the suckiest
> aspects of all of Linux. But we've had that discussion months (and years) ago.
> It has improved in gcc 4.5 so there's some hope.
Yes, there seems to be considerable movement toward better debug info --
which could make statement probing (and not just function-boundary
probing) more and more feasible.
>
...
> Ingo
Thanks.
Jim
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists