[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <4AC3F411.5040506@redhat.com>
Date: Wed, 30 Sep 2009 17:13:05 -0700
From: Josh Stone <jistone@...hat.com>
To: Theodore Tso <tytso@....edu>, LKML <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH] ext4: Add a stub for mpage_da_data in the trace header
On 09/30/2009 02:23 PM, Theodore Tso wrote:
> On Wed, Sep 30, 2009 at 12:45:23PM -0700, Josh Stone wrote:
>> If you just want the data in the trace buffer, then SystemTap is not the
>> tool for you. By all means, just write yourself a perl script or
>> something that parses the trace buffer however you like.
>>
>> On the other hand, stap is useful to do some processing/inspection
>> *live*, at the moment the event happens. For that, we register our own
>> tracepoint handler that can do something different than ftrace.
>
> So there are two things I would point out here. First of all, now
> that ftrace has the ability to do basic filtering, just about the only
> thing SystemTap can do which is unique is either complex filtering,
> summary statistics, or some kind of correlation between multiple
> events (within the limits of restricted memory allocation limits of
> SystemTap).
This "only thing" seems like quite a lot to me, but I suppose the
significance could be a matter of opinion. I would also add that
SystemTap can better support concurrent users who want to monitor
different things.
> So I'm not sure it's such a great idea to cede a large bit of
> functionality to as being something that SystemTap will never
> accomplish --- especially when it's far more convenient and stable
> to depend on fixed trace points than setting arbitrary dynamic trace
> points in the middle of source files which will break all the time
> when distro's release new kernels, etc.
I don't understand your point about ceding here. But yes, I agree that
fixed trace points are more convenient and stable, which is why we've
long supported static instrumentation in the kernel.
> Secondly, while I'm not so sure it's that big of a restriction to have
> Systemtap pull events out of the trace buffer, if you must capture the
> event right as it happens, it should be possible set a kprobe in the
> ftrace subsystem, and then pull out the data of the event from the
> trace buffer.
This is possible, but it's a step backward for a few reasons:
- A kprobe will be inherently slower than a tracepoint handler.
- It requires debuginfo (maybe not to place the probe, but surely to dig
into ftrace's internal data structures).
- It requires knowledge about the ftrace internals, which is fragile and
unmaintainable.
- It assumes that every bit of data that the user wants is captured in
the trace buffer.
I think that last point is particularly significant. Kernel devs are
not prescient, so the trace event might not be capturing all of the data
that's relevant to a particular troubleshooting effort. With stap you
can gather whatever data you want.
(By the way, I seem to recall that we once discussed adding a proper
hook for stap to grab ftrace data as it comes, but I don't think that
went anywhere.)
> Keep in mind that one of the advantage DTrace has over SystemTap is
> that it can use pre-defined events in the kernel, and not have to
> keep userspace macro files in sync with a changing kernel source
> base. It seems counterproductive to throw away the opportunity of
> being able to read the tracepoint event data, since it would give
> SystemTap a lot more power.
Aren't "pre-defined events" == tracepoints? That's exactly what we're
trying to use in SystemTap! But then, DTrace doesn't dictate what data
is captured at those events, so I don't understand why you think we
should be more restrictive.
>> However, SystemTap does *not* require the kernel debuginfo for using
>> tracepoints, even when reading parameters. It should work in the
>> complete absence of CONFIG_DEBUGINFO, so if you find otherwise, please
>> let me know and I will fix it.
>
> Well, how is it going to do that if you don't have access to the
> structure definition? This is why fetching the information from the
> ring buffer is much more powerful.
True, when neither a header nor debuginfo for a private type is
available, then it will be opaque to us, so the ring buffer can offer
pre-defined insight into those structures. But in sched_switch, for
example, the ring buffer only knows prev/next->comm/pid/prio/state,
whereas stap has the entire rq and task_structs at your disposal. Each
has power in their own place...
Josh
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists