linux-kernel - Re: [PATCH] ext4: Add a stub for mpage_da

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <4AC3F411.5040506@redhat.com>
Date:	Wed, 30 Sep 2009 17:13:05 -0700
From:	Josh Stone <jistone@...hat.com>
To:	Theodore Tso <tytso@....edu>, LKML <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH] ext4: Add a stub for mpage_da_data in the trace header

On 09/30/2009 02:23 PM, Theodore Tso wrote:
> On Wed, Sep 30, 2009 at 12:45:23PM -0700, Josh Stone wrote:
>> If you just want the data in the trace buffer, then SystemTap is not the
>> tool for you.  By all means, just write yourself a perl script or
>> something that parses the trace buffer however you like.
>>
>> On the other hand, stap is useful to do some processing/inspection
>> *live*, at the moment the event happens.  For that, we register our own
>> tracepoint handler that can do something different than ftrace.
> 
> So there are two things I would point out here.  First of all, now
> that ftrace has the ability to do basic filtering, just about the only
> thing SystemTap can do which is unique is either complex filtering,
> summary statistics, or some kind of correlation between multiple
> events (within the limits of restricted memory allocation limits of
> SystemTap).

This "only thing" seems like quite a lot to me, but I suppose the
significance could be a matter of opinion.  I would also add that
SystemTap can better support concurrent users who want to monitor
different things.

> So I'm not sure it's such a great idea to cede a large bit of
> functionality to as being something that SystemTap will never 
> accomplish --- especially when it's far more convenient and stable
> to depend on fixed trace points than setting arbitrary dynamic trace 
> points in the middle of source files which will break all the time 
> when distro's release new kernels, etc.

I don't understand your point about ceding here.  But yes, I agree that
fixed trace points are more convenient and stable, which is why we've
long supported static instrumentation in the kernel.

> Secondly, while I'm not so sure it's that big of a restriction to have
> Systemtap pull events out of the trace buffer, if you must capture the
> event right as it happens, it should be possible set a kprobe in the
> ftrace subsystem, and then pull out the data of the event from the
> trace buffer.

This is possible, but it's a step backward for a few reasons:

- A kprobe will be inherently slower than a tracepoint handler.
- It requires debuginfo (maybe not to place the probe, but surely to dig
into ftrace's internal data structures).
- It requires knowledge about the ftrace internals, which is fragile and
unmaintainable.
- It assumes that every bit of data that the user wants is captured in
the trace buffer.

I think that last point is particularly significant.  Kernel devs are
not prescient, so the trace event might not be capturing all of the data
that's relevant to a particular troubleshooting effort.  With stap you
can gather whatever data you want.

(By the way, I seem to recall that we once discussed adding a proper
hook for stap to grab ftrace data as it comes, but I don't think that
went anywhere.)

> Keep in mind that one of the advantage DTrace has over SystemTap is
> that it can use pre-defined events in the kernel, and not have to
> keep userspace macro files in sync with a changing kernel source
> base.  It seems counterproductive to throw away the opportunity of
> being able to read the tracepoint event data, since it would give 
> SystemTap a lot more power.

Aren't "pre-defined events" == tracepoints?  That's exactly what we're
trying to use in SystemTap!  But then, DTrace doesn't dictate what data
is captured at those events, so I don't understand why you think we
should be more restrictive.

>> However, SystemTap does *not* require the kernel debuginfo for using
>> tracepoints, even when reading parameters.  It should work in the
>> complete absence of CONFIG_DEBUGINFO, so if you find otherwise, please
>> let me know and I will fix it.
> 
> Well, how is it going to do that if you don't have access to the
> structure definition?  This is why fetching the information from the
> ring buffer is much more powerful.

True, when neither a header nor debuginfo for a private type is
available, then it will be opaque to us, so the ring buffer can offer
pre-defined insight into those structures.  But in sched_switch, for
example, the ring buffer only knows prev/next->comm/pid/prio/state,
whereas stap has the entire rq and task_structs at your disposal.  Each
has power in their own place...

Josh
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/