lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [day] [month] [year] [list]
Message-ID: <20080122145919.GA17620@Krystal>
Date:	Tue, 22 Jan 2008 09:59:19 -0500
From:	Mathieu Desnoyers <mathieu.desnoyers@...ymtl.ca>
To:	LKML <linux-kernel@...r.kernel.org>
Cc:	Ingo Molnar <mingo@...e.hu>,
	Linus Torvalds <torvalds@...ux-foundation.org>,
	Andrew Morton <akpm@...ux-foundation.org>,
	Christoph Hellwig <hch@...radead.org>,
	"Frank Ch. Eigler" <fche@...hat.com>,
	Jan Kiszka <jan.kiszka@...mens.com>,
	John Stultz <johnstul@...ibm.com>,
	Steven Rostedt <srostedt@...hat.com>, ltt-dev@...fik.org,
	Peter Zijlstra <a.p.zijlstra@...llo.nl>,
	Gregory Haskins <ghaskins@...ell.com>,
	Arnaldo Carvalho de Melo <acme@...stprotocols.net>,
	Thomas Gleixner <tglx@...utronix.de>,
	Sam Ravnborg <sam@...nborg.org>,
	Robert Wisniewski <bobww@...ibm.com>
Subject: [RFC] LTTng Userspace Tracing, vDSO

Hi,

I am poking in the userspace tracing problem again. I had two different
implementations in LTTng, but I left them out so I could come back with
a more satisfying solution.

Here are the ideas I would like to use in my forthcoming implementation.
I submit them for comments, especially about limitations of vDSO I
might not be aware of (code size... ?, max data size ?).

The idea is to fast-path tracing so a system call is not required. I
have to determine if I put the tracing code in a normal shared object or
if it belongs to a vDSO.

Basic ideas :

* Buffers
Per CPU buffers mapped by both userspace and the kernel
menuconfig options for buffers
- per process (safe, a process cannot overwrite other processes'data)
  - 4k default size, menuconfig configurable
    - Total of 12.5MB for 200 traced processes * 4 CPUs.. not bad.
- system-wide (unsafe, low memory consumption)
  - 1MB default size, menuconfig configurable

Depending on the system type, one could wish for efficiency (single-user
server) or protection (multi-user system)

Since threads can be stopped by the OS when the buffers are filled, we
can afford to have smaller buffers than the kernel buffers (4k vs 1MB).

Atomic space reservation to protect against traced signal handlers. Note
that a full cmpxchg (synchronized wrt other CPUs) is required because
the vgetcpu only returns a statistically correct cpu ID (might be
migrated).

* Synchronization
 - Use seqlock to synchronize vsyscall with kernel if required.

* time source
  - vsyscall to get the raw cycle count
  - fallback on kernel syscall

One could ask : why don't you stay in the kernel when you have to use a
syscall to get the timestamps anyway ? Well, because passing a format
string and var args from userspace to the kernel would not be "polite".
;) If userspace passes wrong format strings to the tracing code, I would
rather prefer the segfault to happen in userspace and not involve the
kernel at all.

* marker registration

System call to register markers in library init.
Markers in special section, modify the linker script.
Keep a per-process data structure (in kernel) to keep track of the
registered markers. When the last process using a marker set exits,
remove those markers. (refcount)

* buffer switch

buffer switch performed by a system call.
Use inotify to tell lttd (trace reading daemon) that new debugfs files
are created upon first buffer switch of a userspace process.

* Filtering

Export filter data and code to user-space. (eventually)

* Multiple traces

LTTng supports writing to multiple traces at the same time. Each of them
can have different filtering expressions or be in different mode
(extract the data to disk when the trace is running or flight recorder
mode where the data is only kept in circular buffers and always
overwritten). Can be useful for multiuser systems too.

Therefore, we would have to reserve enough virtual address space in the
vDSO region to put roughly 16 per-cpu buffers (?). Then, upon buffer
allocation for a new trace, we would have to iterate over each process
in the system; if it is being traced, we would allocate extra buffers
within the tracing vDSO data region.

Any thoughts ?

Mathieu

-- 
Mathieu Desnoyers
Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ