[<prev] [next>] [day] [month] [year] [list]
Message-ID: <20080122145919.GA17620@Krystal>
Date: Tue, 22 Jan 2008 09:59:19 -0500
From: Mathieu Desnoyers <mathieu.desnoyers@...ymtl.ca>
To: LKML <linux-kernel@...r.kernel.org>
Cc: Ingo Molnar <mingo@...e.hu>,
Linus Torvalds <torvalds@...ux-foundation.org>,
Andrew Morton <akpm@...ux-foundation.org>,
Christoph Hellwig <hch@...radead.org>,
"Frank Ch. Eigler" <fche@...hat.com>,
Jan Kiszka <jan.kiszka@...mens.com>,
John Stultz <johnstul@...ibm.com>,
Steven Rostedt <srostedt@...hat.com>, ltt-dev@...fik.org,
Peter Zijlstra <a.p.zijlstra@...llo.nl>,
Gregory Haskins <ghaskins@...ell.com>,
Arnaldo Carvalho de Melo <acme@...stprotocols.net>,
Thomas Gleixner <tglx@...utronix.de>,
Sam Ravnborg <sam@...nborg.org>,
Robert Wisniewski <bobww@...ibm.com>
Subject: [RFC] LTTng Userspace Tracing, vDSO
Hi,
I am poking in the userspace tracing problem again. I had two different
implementations in LTTng, but I left them out so I could come back with
a more satisfying solution.
Here are the ideas I would like to use in my forthcoming implementation.
I submit them for comments, especially about limitations of vDSO I
might not be aware of (code size... ?, max data size ?).
The idea is to fast-path tracing so a system call is not required. I
have to determine if I put the tracing code in a normal shared object or
if it belongs to a vDSO.
Basic ideas :
* Buffers
Per CPU buffers mapped by both userspace and the kernel
menuconfig options for buffers
- per process (safe, a process cannot overwrite other processes'data)
- 4k default size, menuconfig configurable
- Total of 12.5MB for 200 traced processes * 4 CPUs.. not bad.
- system-wide (unsafe, low memory consumption)
- 1MB default size, menuconfig configurable
Depending on the system type, one could wish for efficiency (single-user
server) or protection (multi-user system)
Since threads can be stopped by the OS when the buffers are filled, we
can afford to have smaller buffers than the kernel buffers (4k vs 1MB).
Atomic space reservation to protect against traced signal handlers. Note
that a full cmpxchg (synchronized wrt other CPUs) is required because
the vgetcpu only returns a statistically correct cpu ID (might be
migrated).
* Synchronization
- Use seqlock to synchronize vsyscall with kernel if required.
* time source
- vsyscall to get the raw cycle count
- fallback on kernel syscall
One could ask : why don't you stay in the kernel when you have to use a
syscall to get the timestamps anyway ? Well, because passing a format
string and var args from userspace to the kernel would not be "polite".
;) If userspace passes wrong format strings to the tracing code, I would
rather prefer the segfault to happen in userspace and not involve the
kernel at all.
* marker registration
System call to register markers in library init.
Markers in special section, modify the linker script.
Keep a per-process data structure (in kernel) to keep track of the
registered markers. When the last process using a marker set exits,
remove those markers. (refcount)
* buffer switch
buffer switch performed by a system call.
Use inotify to tell lttd (trace reading daemon) that new debugfs files
are created upon first buffer switch of a userspace process.
* Filtering
Export filter data and code to user-space. (eventually)
* Multiple traces
LTTng supports writing to multiple traces at the same time. Each of them
can have different filtering expressions or be in different mode
(extract the data to disk when the trace is running or flight recorder
mode where the data is only kept in circular buffers and always
overwritten). Can be useful for multiuser systems too.
Therefore, we would have to reserve enough virtual address space in the
vDSO region to put roughly 16 per-cpu buffers (?). Then, upon buffer
allocation for a new trace, we would have to iterate over each process
in the system; if it is being traced, we would allocate extra buffers
within the tracing vDSO data region.
Any thoughts ?
Mathieu
--
Mathieu Desnoyers
Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists