linux-kernel - [RFC] LTTng Userspace Tracing, vDSO

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [day] [month] [year] [list]

Message-ID: <20080122145919.GA17620@Krystal>
Date:	Tue, 22 Jan 2008 09:59:19 -0500
From:	Mathieu Desnoyers <mathieu.desnoyers@...ymtl.ca>
To:	LKML <linux-kernel@...r.kernel.org>
Cc:	Ingo Molnar <mingo@...e.hu>,
	Linus Torvalds <torvalds@...ux-foundation.org>,
	Andrew Morton <akpm@...ux-foundation.org>,
	Christoph Hellwig <hch@...radead.org>,
	"Frank Ch. Eigler" <fche@...hat.com>,
	Jan Kiszka <jan.kiszka@...mens.com>,
	John Stultz <johnstul@...ibm.com>,
	Steven Rostedt <srostedt@...hat.com>, ltt-dev@...fik.org,
	Peter Zijlstra <a.p.zijlstra@...llo.nl>,
	Gregory Haskins <ghaskins@...ell.com>,
	Arnaldo Carvalho de Melo <acme@...stprotocols.net>,
	Thomas Gleixner <tglx@...utronix.de>,
	Sam Ravnborg <sam@...nborg.org>,
	Robert Wisniewski <bobww@...ibm.com>
Subject: [RFC] LTTng Userspace Tracing, vDSO

Hi,

I am poking in the userspace tracing problem again. I had two different
implementations in LTTng, but I left them out so I could come back with
a more satisfying solution.

Here are the ideas I would like to use in my forthcoming implementation.
I submit them for comments, especially about limitations of vDSO I
might not be aware of (code size... ?, max data size ?).

The idea is to fast-path tracing so a system call is not required. I
have to determine if I put the tracing code in a normal shared object or
if it belongs to a vDSO.

Basic ideas :

* Buffers
Per CPU buffers mapped by both userspace and the kernel
menuconfig options for buffers
- per process (safe, a process cannot overwrite other processes'data)
  - 4k default size, menuconfig configurable
    - Total of 12.5MB for 200 traced processes * 4 CPUs.. not bad.
- system-wide (unsafe, low memory consumption)
  - 1MB default size, menuconfig configurable

Depending on the system type, one could wish for efficiency (single-user
server) or protection (multi-user system)

Since threads can be stopped by the OS when the buffers are filled, we
can afford to have smaller buffers than the kernel buffers (4k vs 1MB).

Atomic space reservation to protect against traced signal handlers. Note
that a full cmpxchg (synchronized wrt other CPUs) is required because
the vgetcpu only returns a statistically correct cpu ID (might be
migrated).

* Synchronization
 - Use seqlock to synchronize vsyscall with kernel if required.

* time source
  - vsyscall to get the raw cycle count
  - fallback on kernel syscall

One could ask : why don't you stay in the kernel when you have to use a
syscall to get the timestamps anyway ? Well, because passing a format
string and var args from userspace to the kernel would not be "polite".
;) If userspace passes wrong format strings to the tracing code, I would
rather prefer the segfault to happen in userspace and not involve the
kernel at all.

* marker registration

System call to register markers in library init.
Markers in special section, modify the linker script.
Keep a per-process data structure (in kernel) to keep track of the
registered markers. When the last process using a marker set exits,
remove those markers. (refcount)

* buffer switch

buffer switch performed by a system call.
Use inotify to tell lttd (trace reading daemon) that new debugfs files
are created upon first buffer switch of a userspace process.

* Filtering

Export filter data and code to user-space. (eventually)

* Multiple traces

LTTng supports writing to multiple traces at the same time. Each of them
can have different filtering expressions or be in different mode
(extract the data to disk when the trace is running or flight recorder
mode where the data is only kept in circular buffers and always
overwritten). Can be useful for multiuser systems too.

Therefore, we would have to reserve enough virtual address space in the
vDSO region to put roughly 16 per-cpu buffers (?). Then, upon buffer
allocation for a new trace, we would have to iterate over each process
in the system; if it is being traced, we would allocate extra buffers
within the tracing vDSO data region.

Any thoughts ?

Mathieu

-- 
Mathieu Desnoyers
Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/