[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <c62985530903020422m6d3a6da6o5f7b35338142e349@mail.gmail.com>
Date: Mon, 2 Mar 2009 13:22:54 +0100
From: Frédéric Weisbecker <fweisbec@...il.com>
To: Tom Zanussi <tzanussi@...il.com>
Cc: Steven Rostedt <rostedt@...dmis.org>, linux-kernel@...r.kernel.org,
Ingo Molnar <mingo@...e.hu>,
Andrew Morton <akpm@...ux-foundation.org>,
Peter Zijlstra <peterz@...radead.org>,
Mathieu Desnoyers <compudj@...stal.dyndns.org>,
Masami Hiramatsu <mhiramat@...hat.com>,
KOSAKI Motohiro <kosaki.motohiro@...fujitsu.com>,
Jason Baron <jbaron@...hat.com>,
"Frank Ch. Eigler" <fche@...hat.com>, acme@...stprotocols.net,
Steven Rostedt <srostedt@...hat.com>
Subject: Re: [PATCH 07/10] tracing: add raw trace point recording
infrastructure
2009/3/2 Tom Zanussi <tzanussi@...il.com>:
> Hi,
>
> On Sat, 2009-02-28 at 04:06 -0500, Steven Rostedt wrote:
>> plain text document attachment
>> (0007-tracing-add-raw-trace-point-recording-infrastructur.patch)
>> From: Steven Rostedt <srostedt@...hat.com>
>>
>> Impact: lower overhead tracing
>>
>> The current event tracer can automatically pick up trace points
>> that are registered with the TRACE_FORMAT macro. But it required
>> a printf format string and parsing. Although, this adds the ability
>> to get guaranteed information like task names and such, it took
>> a hit in overhead processing. This processing can add about 500-1000
>> nanoseconds overhead, but in some cases that too is considered
>> too much and we want to shave off as much from this overhead as
>> possible.
>>
>> Tom Zanussi recently posted tracing patches to lkml that are based
>> on a nice idea about capturing the data via C structs using
>> STRUCT_ENTER, STRUCT_EXIT type of macros.
>>
>> I liked that method very much, but did not like the implementation
>> that required a developer to add data/code in several disjoint
>> locations.
>>
>> This patch extends the event_tracer macros to do a similar "raw C"
>> approach that Tom Zanussi did. But instead of having the developers
>> needing to tweak a bunch of code all over the place, they can do it
>> all in one macro - preferably placed near the code that it is
>> tracing. That makes it much more likely that tracepoints will be
>> maintained on an ongoing basis by the code they modify.
>>
>> The new macro TRACE_EVENT_FORMAT is created for this approach. (Note,
>> a developer may still utilize the more low level DECLARE_TRACE macros
>> if they don't care about getting their traces automatically in the event
>> tracer.)
>>
>> They can also use the existing TRACE_FORMAT if they don't need to code
>> the tracepoint in C, but just want to use the convenience of printf.
>>
>> So if the developer wants to "hardwire" a tracepoint in the fastest
>> possible way, and wants to acquire their data via a user space utility
>> in a raw binary format, or wants to see it in the trace output but not
>> sacrifice any performance, then they can implement the faster but
>> more complex TRACE_EVENT_FORMAT macro.
>>
>> Here's what usage looks like:
>>
>> TRACE_EVENT_FORMAT(name,
>> TPPROTO(proto),
>> TPARGS(args),
>> TPFMT(fmt, fmt_args),
>> TRACE_STUCT(
>> TRACE_FIELD(type1, item1, assign1)
>> TRACE_FIELD(type2, item2, assign2)
>> [...]
>> ),
>> TPRAWFMT(raw_fmt)
>> );
>>
>> Note name, proto, args, and fmt, are all identical to what TRACE_FORMAT
>> uses.
>>
>> name: is the unique identifier of the trace point
>> proto: The proto type that the trace point uses
>> args: the args in the proto type
>> fmt: printf format to use with the event printf tracer
>> fmt_args: the printf argments to match fmt
>>
>> TRACE_STRUCT starts the ability to create a structure.
>> Each item in the structure is defined with a TRACE_FIELD
>>
>> TRACE_FIELD(type, item, assign)
>>
>> type: the C type of item.
>> item: the name of the item in the stucture
>> assign: what to assign the item in the trace point callback
>>
>> raw_fmt is a way to pretty print the struct. It must match
>> the order of the items are added in TRACE_STUCT
>>
>> An example of this would be:
>>
>> TRACE_EVENT_FORMAT(sched_wakeup,
>> TPPROTO(struct rq *rq, struct task_struct *p, int success),
>> TPARGS(rq, p, success),
>> TPFMT("task %s:%d %s",
>> p->comm, p->pid, success?"succeeded":"failed"),
>> TRACE_STRUCT(
>> TRACE_FIELD(pid_t, pid, p->pid)
>> TRACE_FIELD(int, success, success)
>> ),
>> TPRAWFMT("task %d success=%d")
>> );
>>
>> This creates us a unique struct of:
>>
>> struct {
>> pid_t pid;
>> int success;
>> };
>>
>> And the way the call back would assign these values would be:
>>
>> entry->pid = p->pid;
>> entry->success = success;
>>
>> The nice part about this is that the creation of the assignent is done
>> via macro magic in the event tracer. Once the TRACE_EVENT_FORMAT is
>> created, the developer will then have a faster method to record
>> into the ring buffer. They do not need to worry about the tracer itself.
>>
>
> Nice improvements - I definitely was unhappy about having things spread
> around in different files unnecessarily. And I like the fact that your
> macros generate assignments too but am curious about what to do if you
> need to do something more complicated than an assignment e.g. in the
> block tracepoints I had to assign fields differently based on the value
> of blk_pc_request():
>
> if (blk_pc_request(rq)) {
> zed_event->sector = 0;
> zed_event->bytes = rq->data_len;
> zed_event->pdu_len = pdu_len;
> memcpy(zed_event->pdu, rq->cmd, pdu_len);
> } else {
> zed_event->sector = rq->hard_sector;
> zed_event->bytes = rq->hard_nr_sectors << 9;
> zed_event->pdu_len = 0;
> }
>
> Is there a way to define some fields but without the assignments, and do
> them manually somewhere else? I guess it would be nice to be able to
> define all events using TRACE_EVENT_FORMAT but have a way to special
> case certain events/fields.
Note that on such case you can do a conditional assignment:
TRACE_FIELD(int, bytes, blk_pc_request(rq) ? rq->data_len :
rq->hard_nr_sectors << 9);
The drawback here is that you'll have to repeat this conditional for
each field, which is annoying
and a loss of performance.
Perhaps it would be interesting to allow something more low level if
the user wishes.
> Anyway, sorry if it's already handled in the code - haven't had a chance
> to really peruse it.
>
> Tom
>
>
>
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists