linux-kernel - Re: Checking to see if a bit is _not

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Date:	Tue, 2 Dec 2014 00:04:06 -0500
From:	Theodore Ts'o <tytso@....edu>
To:	Alexei Starovoitov <ast@...mgrid.com>
Cc:	Steven Rostedt <rostedt@...dmis.org>,
	LKML <linux-kernel@...r.kernel.org>
Subject: Re: Checking to see if a bit is _not_ set in a ftrace event filter

On Mon, Dec 01, 2014 at 07:52:11PM -0800, Alexei Starovoitov wrote:
> Ted, I don't see 'writeback_mark_inode_dirty' event
> in the tree. Some new stuff?

Yep, see:

http://thread.gmane.org/gmane.comp.file-systems.ext4/47092

Except instead of the mini-script which I gave in the above URL, I
wanted to do additional filtering.  The current hack which I am using
instead of:

echo "flags == 2048" > events/writeback/writeback_mark_inode_dirty/filter

is:

echo "(flags == 2048) && (state < 2048)" > events/writeback/writeback_mark_inode_dirty/filter

... but this relies on the fact that all of the i_state bits that I
care about are at positions 1 << 10 and below.  i.e., it's a terrible
hack.

> What kind of post-filtering are you doing with this event?
> Just visually checking that trace is sane or the trace output
> is fed into other tools? Are you trying to aggregate or
> correlate multiple events (may be based on 'ino') ?

I plan to write some tools that agregate based on 'ino', but I haven't
yet.

> It will change the workflow for folks who use 'echo expr > filter'
> directly. trace-cmd -e -f can be made to work transparently
> with new features.

This will break a bunch of **really** useful scripts found at:

	https://github.com/brendangregg/perf-tools.git

OTOH, Brendan will probably will be able to rewrite them to take
advantage of the new interfaces, and I'm sure he'll appreciate the
power of being able to use eBPF.  :-)

> One of the goals for eBPF+tracing is to minimize
> additions of new tracepoints. Right now we already
> have a ton of them. events/ext4.h is ~2500 lines.
> Some of them look like hooks for in-production
> debugging of a function at a time. Sort of like poor's man
> kprobe/kretprobe.

Well, except that kprobe and kretprobe can't give me the arguments
passed into the function (unless you compile with full -g debugging
info enabled and bloat the object files and compilation time by a
factor of 10 --- which I can't stand and why I use ftrace instead of
systemtap :-)

> With eBPF we should be able to avoid adding
> trace_func_enter(), trace_func_exit() to so many func.

If eBPF can solve the ability to be able to get at the critical
function variables without making the compiled kernel take 10x the
disk space and time to compile (mostly due to the time to write out
the !@#!@?! bloated object files), that would be great.  My
understanding though is that this fundamentally requires improved
DWARF compression and structure information deduping, which the
systemtap folks promised would be coming in improved compiler
toolchains many years ago, but as far as I know has never
materialized.  :-(

But that's why I have the trace_func_enter() and trace_func_exit()
calls; I need to be able to get do various run-time debugging without
needing to recompile the kernel and without forcing all of my
development builds to have full debug info.

					- Ted
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/