linux-kernel - Unified tracing buffer

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [thread-next>] [day] [month] [year] [list]

Date:	Fri, 19 Sep 2008 14:33:42 -0700
From:	"Martin Bligh" <mbligh@...gle.com>
To:	"Linux Kernel Mailing List" <linux-kernel@...r.kernel.org>
Cc:	"Linus Torvalds" <torvalds@...ux-foundation.org>,
	"Thomas Gleixner" <tglx@...utronix.de>,
	"Mathieu Desnoyers" <compudj@...stal.dyndns.org>,
	"Steven Rostedt" <rostedt@...dmis.org>, od@...ell.com,
	"Frank Ch. Eigler" <fche@...hat.com>
Subject: Unified tracing buffer

During kernel summit and Plumbers conference, Linus and others
expressed a desire for a unified
tracing buffer system for multiple tracing applications (eg ftrace,
lttng, systemtap, blktrace, etc) to use.
This provides several advantages, including the ability to interleave
data from multiple sources,
not having to learn 200 different tools, duplicated code/effort, etc.

Several of us got together last night and tried to cut this down to
the simplest usable system
we could agree on (and nobody got hurt!). This will form version 1.
I've sketched out a few
enhancements we know that we want, but have agreed to leave these
until version 2.
The answer to most questions about the below is "yes we know, we'll
fix that in version 2"
(or 3). Simplicity was the rule ...

Sketch of design.  Enjoy flaming me. Code will follow shortly.

STORAGE
-------

We will support multiple buffers for different tracing systems, with
separate names, event id spaces.
Event ids are 16 bit, dynamically allocated.
A "one line of text" print function will be provided for each event,
or use the default (probably hex printf)
Will provide a "flight data recorder" mode, and a "spool to disk" mode.

Circular buffer per cpu, protected by per-cpu spinlock_irq
Word aligned records.
Variable record length, header will start with length record.
Timestamps in fixed timebase, monotonically increasing (across all CPUs)

INPUT_FUNCTIONS
---------------

allocate_buffer (name, size)
        return buffer_handle

register_event (buffer_handle, event_id, print_function)
        You can pass in a requested event_id from a fixed set, and
will be given it, or an error
        0 means allocate me one dynamically
        returns event_id     (or -E_ERROR)

record_event (buffer_handle, event_id, length, *buf)

OUTPUT
------

Data will be output via debugfs, and provide the following output streams:

/debugfs/tracing/<name>/buffers/text
    clear text stream (will merge the per-cpu streams via insertion
sort, and use the print functions)

/debugfs/tracing/<name>/buffers/binary[cpu_number]
    per-cpu binary data

CONTROL
-------

Sysfs style tree under debugfs

/debugfs/tracing/<name>/buffers/enabed         <--- binary value

/debugfs/tracing/<name>/<event1>
/debugfs/tracing/<name>/<event2>
    etc ...
    provides a way to enable/disable events, see what's available, and
what's enabled.

KNOWN ISSUES / PLANS
-------------------

No way to unregister buffers and events.
    Will provide an unregister_buffer and unregister_event call

Generating systemwide time is hard on some platforms
    Yes. Time-based output provides a lot of simplicity for the user though
    We won't support these platforms at first, we'll add functionality
to make it work for them later.
    (plan based on tick-based ms timing, plus counter offset from that
if needed).

Spinlock_irq is ineffecient, and doesn't support tracing in NMIs
    True. We'll implement a lockless scheme later (see lttng)

Putting a length record in every event is inefficient
    True. Fixed record length with optional extensions is better, but
more complex. v2.

Putting a full timestamp rather than an offset in every event is inefficient
    See above. True, but v2.

Relayfs already exists! use that!
    People were universally not keen on that idea. Complexity, interface, etc.
    We're also providing some higher level shared functions for time &
event ids.

There's no way to decode the binary data stream
    Code will be shared from the kernel to decode it, so that we can
get the compact binary
    format and decode it later. That code will be kept in the kernel
tree (it's a trivial piece of C).
    Version 1.1 ;-)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/