[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20080425131717.GA8034@Krystal>
Date: Fri, 25 Apr 2008 09:17:17 -0400
From: Mathieu Desnoyers <mathieu.desnoyers@...ymtl.ca>
To: "Frank Ch. Eigler" <fche@...hat.com>
Cc: Alexey Dobriyan <adobriyan@...il.com>, akpm@...ux-foundation.org,
Ingo Molnar <mingo@...e.hu>, linux-kernel@...r.kernel.org
Subject: [RFC] system-wide in-kernel syscall tracing
* Mathieu Desnoyers (mathieu.desnoyers@...ymtl.ca) wrote:
> > >
> > > Those which are close enough to system call boundary are essentially
> > > strace(1).
> >
> > Those may not sound worthwhile to put a marker for, BUT, you're
> > ignoring the huge differences of impact and scope. A system-wide
> > marker-based trace (filtered a la systemtap if desired) can be done
> > with a tiny fraction of system load and none of the disruption caused
> > by an strace of all the processes.
> >
>
> I agree with both ;) Actually we need a low-overhead hook in
> syscall_trace(), so we can perform efficient system-wide tracing of
> system calls. I'll dig in this as soon as I find time.
>
> Basic ideas :
>
> - I already have the TIF_KERNEL_TRACE thread flag added to all
> architectures in another patchset.
> - We add a function called on TIF_KERNEL_TRACE, from do_syscall_trace(),
> which is architecture-specific. It's basically a big switch() for all
> system calls. syscalls which takes similar types could be grouped
> together, but I don't think it would be useful at all. It might be
> better just to add a trace_mark for each so we extract the syscall
> fields in the marker string.
> - We perform the page fault (caused by strings and structures) reads on
> the spot, because we prefer not to do this in atomic context.
> - We put a marker, e.g., for x86_32, a pseudo-code like :
>
> syscall_trace_enter()
> {
> ...
> if (test_thread_flag(TIF_KERNEL_TRACE))
> do_marker_syscall_trace();
> ...
> }
>
> do_marker_syscall_trace()
> {
> char *tmpbuf;
>
> switch(regs->orig_ax) {
>
> case SYS_OPEN:
> tmpbuf = vmalloc(4096); /* what size is needed ? */
> copy_from_user(tmpbuf, regs->bx);
> trace_mark(sys_open, "filename %p flags %d mode %d",
Actually, I meant :
trace_mark(sys_open, "filename %s flags %d mode %d",
and it would be even better to pass the __user pointer directly to the
probe to eliminate the copy. I think this could be done by making sure
the memory is faulted-in and locked when we call the trace_mark. It
could require to think of a way to specify a weird format string type
though, so an automated tracer would use strncpy_from_user in atomic and
al instead of trying to dereference the userspace pointer directly.
Mathieu
> tmpbuf, regs->cx, regs->dx);
> vfree(tmpbuf);
> break;
> }
> }
>
> Modulo some optimization, what do you think of this ? If someone is
> willing to implement this, I can provide the patchset for
> TIF_KERNEL_TRACE.
>
--
Mathieu Desnoyers
Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists