lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Fri, 30 Mar 2012 13:57:17 +0200
From:	Frederic Weisbecker <fweisbec@...il.com>
To:	"H. Peter Anvin" <hpa@...or.com>
Cc:	Vaibhav Nagarnaik <vnagarnaik@...gle.com>,
	Ingo Molnar <mingo@...nel.org>,
	Steven Rostedt <rostedt@...dmis.org>,
	Thomas Gleixner <tglx@...utronix.de>,
	Ingo Molnar <mingo@...hat.com>,
	David Sharp <dhsharp@...gle.com>,
	Justin Teravest <teravest@...gle.com>,
	Laurent Chavey <chavey@...gle.com>,
	Michael Davidson <md@...gle.com>, x86@...nel.org,
	linux-kernel@...r.kernel.org
Subject: Re: [PATCH 4/6] trace: trace syscall in its handler not from ptrace
 handler

On Thu, Mar 29, 2012 at 01:06:10PM -0700, H. Peter Anvin wrote:
> On 03/29/2012 12:43 PM, Vaibhav Nagarnaik wrote:
> > 
> > However, we agree that the syscall tracing as implemented currently is
> > a bit unwieldy. We would want to be a part of the re-designing effort
> > if there is a momentum in the community towards that goal. We would be
> > happy to contribute towards this effort.
> > 
> 
> I had a long discussion with Frederic over IRC earlier today.  We came
> up with the following strawman:
> 
> 1. A system call thunk (which could be enabled/disabled by patching the
> syscall table.)  This provides an entry and exit hook, and also sets a
> per-thread flag to capture userspace traffic.
> 
> 2. Instrumenting get_user/put_user/copy_from_user/copy_to_user to
> capture traffic to userspace.  This captures the *full* set of system
> call arguments, including things addressed via pointers.  Furthermore,
> it captures the exact versions fed to or returned from the kernel, and
> deals with data-dependent collection like ioctl().
> 
> This has to be done with extreme care to avoid introducing overhead in
> the no-tracing case, however, as these functions are extraordinarily
> performance sensitive.  This probably will require careful patching in
> the first enable/last disable case.
> 
> 3. There will need to be userspace tools written to decode the resulting
> trace buffer.  This is pretty much needed anyway, but once you throw in
> complex data structures it becomes even more so.  A trace will basically
> consist of:
> 
> SYSCALL_ENTRY <syscall number> <arg1..6>
> COPY_FROM_USER <address> <data>
>   ...
> COPY_TO_USER <address> <data>
>   ...
> SYSCALL_EXIT <return value>
> 
> Outputting this in human-readable format requires some reasonably
> sophisticated logic, but the *HUGE* advantage is that not only is all
> the information there, it is *correct by construction*.
> 
> 	-hpa


Note we have the relevant tracepoints in place with the "raw_syscalls"
events subsystem. They are generic with only two tracepoints sys_enter
and sys_exit and they blindly dump the syscall number/arg/return value:

$ cat /sys/kernel/debug/tracing/events/raw_syscalls/sys_enter/format 
name: sys_enter
ID: 53
format:
        field:unsigned short common_type;       offset:0;       size:2; signed:0;
        field:unsigned char common_flags;       offset:2;       size:1; signed:0;
        field:unsigned char common_preempt_count;       offset:3;       size:1; signed:0;
        field:int common_pid;   offset:4;       size:4; signed:1;
        field:int common_padding;       offset:8;       size:4; signed:1;

        field:long id;  offset:16;      size:8; signed:1;
        field:unsigned long args[6];    offset:24;      size:48;        signed:0;

print fmt: "NR %ld (%lx, %lx, %lx, %lx, %lx, %lx)", REC->id, REC->args[0], REC->args[1], REC->args[2], REC->args[3], 
REC->args[4], REC->args[5]

$ cat /sys/kernel/debug/tracing/events/raw_syscalls/sys_exit/format 
name: sys_exit
ID: 52
format:
        field:unsigned short common_type;       offset:0;       size:2; signed:0;
        field:unsigned char common_flags;       offset:2;       size:1; signed:0;
        field:unsigned char common_preempt_count;       offset:3;       size:1; signed:0;
        field:int common_pid;   offset:4;       size:4; signed:1;
        field:int common_padding;       offset:8;       size:4; signed:1;

        field:long id;  offset:16;      size:8; signed:1;
        field:long ret; offset:24;      size:8; signed:1;

print fmt: "NR %ld = %ld", REC->id, REC->ret

Now we have yet to do the syscall table patching and the copy_*_user() tracepoints.
But other than these details the bulk of the remaining work is in userspace.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ