[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20101209145541.GC1712@nowhere>
Date: Thu, 9 Dec 2010 15:55:44 +0100
From: Frederic Weisbecker <fweisbec@...il.com>
To: Steven Rostedt <rostedt@...dmis.org>
Cc: David Sharp <dhsharp@...gle.com>, linux-kernel@...r.kernel.org,
mrubin@...gle.com
Subject: Re: [PATCH 13/15] small_traces: Add config option to shrink trace
events.
On Fri, Dec 03, 2010 at 09:54:12PM -0500, Steven Rostedt wrote:
> On Fri, 2010-12-03 at 18:33 -0800, David Sharp wrote:
> > I considered that, and I generally thing it's a good idea. However, I
> > also want to use this switch to shrink individual tracepoint event
> > structures.
> >
> > eg: sched switch is a high frequency event and it is 68 bytes (60
> > after these patches)
> >
> > Can you suggest a syntax for TRACE_EVENT, DECLARE_EVENT_CLASS, etc,
> > that could express the two versions and produce the right code?
> >
> > I'm worried about adding even further complexity to the TRACE_EVENT
> > macros. I could add TRACE_EVENT_SMALL that takes two versions of
> > TP_STRUCT__entry, TP_fast_assign, and TP_printk each, but then this
> > will need to be permuted with your TP_CONDITIONAL patches as well.
>
> I would not touch the TRACE_EVENT() structures. They are there as is and
> I would not think about changing them. Something like that would never
> make it into mainline.
>
> Now what you can do, is to make your own events based off of the same
> tracepoints. For example, the TRACE_EVENT(sched_switch...) has in
> sched.c:
>
> trace_sched_switch(prev, next);
>
>
> You could even write a module that does something like this:
>
> register_trace_sched_switch(probe_sched_switch, mydata);
>
>
>
> void probe_sched_switch(void *mydata,
> struct task_struct *prev,
> struct task_struct *next)
> {
> struct ring_buffer *buffer;
> struct ring_buffer_event *event;
> struct myentry *entry;
>
> event = trace_current_buffer_lock_reserve(buffer,
> mytype, sizeof(*entry),
> 0, 0);
>
> if (!event)
> return;
>
> entry = ring_buffer_event_data(event);
>
> entry->myfield = prev->x;
> ...
>
> trace_nowake_buffer_unlock_commit(buffer, event,
> 0, 0);
> }
>
> You will need to do a register_ftrace_event() to register that 'mytype'
> and how to output it. Otherwise it would just be ignored in the "trace"
> file.
>
> All of the above would work fine as a loadable module that you could
> easily maintain out of tree, and still uses the internals of the system.
>
> -- Steve
>
>
But this would improve only google's tracing while this is a general
mainline tracing problem.
The first thing is that we need to get rid of the lock_depth field, the bkl
is dying.
For the rest what about having a bitmap of the fields we want to ignore,
which can be setup from a trace file for ftrace and as an ioctl for perf.
So this bitmap is easy to implement on the common fields.
For the rest, one could choose between using TP_fast_assign()
and TP_cond_assign().
TP_fast_assign() stays as is and doesn't implement bitmap field
ignoring. Those who want conditional record will need
TP_cond_assign().
Well, unfortunately this probably requires us to play
the same trickery than SYSCALL_DEFINE() in that we'll probably
need TP_cond_assign1(), TP_cond_assign2(), TP_cond_assign3(), etc...
#define TP_cond_assignx(nr, assign) \
if (call->bitmask & nr) { \
assign
}
#define TP_cond_assign2(nr, assign, ...) \
TP_cond_assignx(nr, assign) \
TP_cond_assign1(nr + 1, __VA_ARGS__)
#define TP_cond_assign3(nr, assign, ...) \
TP_cond_assignx(nr, assign) \
TP_cond_assign2(nr + 1, __VA_ARGS__)
That will also require a bit more trickery to dynamically
pre-compute the size of the trace entry.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists