lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Tue, 10 Mar 2009 09:45:56 -0400 (EDT)
From:	Steven Rostedt <rostedt@...dmis.org>
To:	Ingo Molnar <mingo@...e.hu>
cc:	KOSAKI Motohiro <kosaki.motohiro@...fujitsu.com>,
	linux-kernel@...r.kernel.org,
	Andrew Morton <akpm@...ux-foundation.org>,
	Thomas Gleixner <tglx@...utronix.de>,
	Peter Zijlstra <peterz@...radead.org>,
	Frederic Weisbecker <fweisbec@...il.com>,
	Theodore Tso <tytso@....edu>,
	Arnaldo Carvalho de Melo <acme@...hat.com>,
	"H. Peter Anvin" <hpa@...or.com>,
	Mathieu Desnoyers <compudj@...stal.dyndns.org>,
	Lai Jiangshan <laijs@...fujitsu.com>,
	"Martin J. Bligh" <mbligh@...igh.org>,
	"Frank Ch. Eigler" <fche@...hat.com>,
	Larry Woodman <lwoodman@...hat.com>,
	Jason Baron <jbaron@...hat.com>,
	Tom Zanussi <tzanussi@...il.com>,
	Masami Hiramatsu <mhiramat@...hat.com>,
	Christoph Hellwig <hch@...radead.org>,
	Jiaying Zhang <jiayingz@...gle.com>,
	Steven Rostedt <srostedt@...hat.com>
Subject: Re: [PATCH 4/7] tracing: new format for specialized trace points


On Tue, 10 Mar 2009, Ingo Molnar wrote:

> 
> * KOSAKI Motohiro <kosaki.motohiro@...fujitsu.com> wrote:
> 
> > Hi Steven,
> > 
> > >  TRACE_EVENT(sched_switch,
> > > 
> > > 	TP_PROTO(struct rq *rq, struct task_struct *prev,
> > > 		 struct task_struct *next),
> > > 
> > > 	TP_ARGS(rq, prev, next),
> > > 
> > > 	TP_STRUCT__entry(
> > > 		__array(	char,	prev_comm,	TASK_COMM_LEN	)
> > > 		__field(	pid_t,	prev_pid			)
> > > 		__field(	int,	prev_prio			)
> > > 		__array(	char,	next_comm,	TASK_COMM_LEN	)
> > > 		__field(	pid_t,	next_pid			)
> > > 		__field(	int,	next_prio			)
> > > 	),
> > > 
> > > 	TP_printk("task %s:%d [%d] ==> %s:%d [%d]",
> > > 		__entry->prev_comm, __entry->prev_pid, __entry->prev_prio,
> > > 		__entry->next_comm, __entry->next_pid, __entry->next_prio),
> > > 
> > > 	TP_fast_assign(
> > > 		memcpy(__entry->next_comm, next->comm, TASK_COMM_LEN);
> > > 		__entry->prev_pid	= prev->pid;
> > > 		__entry->prev_prio	= prev->prio;
> > > 		memcpy(__entry->prev_comm, prev->comm, TASK_COMM_LEN);
> > > 		__entry->next_pid	= next->pid;
> > > 		__entry->next_prio	= next->prio;
> > > 	)
> > >  );
> > 
> > Could you please write the documentation how to make new 
> > tracepont and use various TP_ macro? Some developer (include 
> > me) plan to add new tracepoint. but these macro usage is a bit 
> > difficult.
> 
> Sure, here's a commented/annotated variant:
> 
>   /*
>    * We define a tracepoint, its arguments, its printk format
>    * and its 'fast binay record' layout.
>    *
>    * Firstly, name your tracepoint via TRACE_EVENT(name : the 
>    * 'subsystem_event' notation is fine.
>    *
>    * Think about this whole construct as the 
>    * 'trace_sched_switch() function' from now on.
>    */
>  
>   TRACE_EVENT(sched_switch,
> 
>         /*
>          * A function has a regular function arguments 
>          * prototype, declare it via TP_PROTO():
>          */
>  	TP_PROTO(struct rq *rq, struct task_struct *prev,
>  		 struct task_struct *next),
>  
>         /*
>          * Define the call signature of the 'function'. 
> 	 * (Design sidenote: we use this instead of a
> 	 *  TP_PROTO1/TP_PROTO2/TP_PROTO3 ugliness.)
>          */
>  	TP_ARGS(rq, prev, next),
>  
>         /*
>          * Fast binary tracing: define the trace record via 
>          * TP_STRUCT__entry(). You can think about it like a 
>          * regular C structure local variable definition.
>          *
>          * This is how the trace record is structured and will
>          * be saved into the ring buffer. These are the fields
>          * that will be exposed to user-space in 
>          * /debug/tracing/events/*/format.
>          *
>          * The declared 'local variable' is called '__entry'
>          */
>  	TP_STRUCT__entry(
>  		__array(	char,	prev_comm,	TASK_COMM_LEN	)
>  		__field(	pid_t,	prev_pid			)
>  		__field(	int,	prev_prio			)
>  		__array(	char,	next_comm,	TASK_COMM_LEN	)
>  		__field(	pid_t,	next_pid			)
>  		__field(	int,	next_prio			)
>  	),
> 
>         /*
>          * Assign the entry into the trace record, by embedding 
>          * a full C statement block into TP_fast_assign(). You 
>          * can refer to the trace record as '__entry' - 
>          * otherwise you can put arbitrary C code in here.
>          *
>          * Note: this C code will execute every time a trace event 
>          * happens, on an active tracepoint.
>          */
>  	TP_fast_assign(
>  		memcpy(__entry->next_comm, next->comm, TASK_COMM_LEN);
>  		__entry->prev_pid	= prev->pid;
>  		__entry->prev_prio	= prev->prio;
>  		memcpy(__entry->prev_comm, prev->comm, TASK_COMM_LEN);
>  		__entry->next_pid	= next->pid;
>  		__entry->next_prio	= next->prio;
>  	)
> 
>         /*
>          * Formatted output of a trace record via TP_printk(). 
>          * This is how the tracepoint will appear under ftrace 
>          * plugins that make use of this tracepoint.
>          *
>          * (raw-binary tracing wont actually perform this step.)
>          */
>  	TP_printk("task %s:%d [%d] ==> %s:%d [%d]",
>  		__entry->prev_comm, __entry->prev_pid, __entry->prev_prio,
>  		__entry->next_comm, __entry->next_pid, __entry->next_prio),
> 
>   );
> 
> This macro construct is thus used for the regular printk format 
> tracing setup, it is used to construct a function pointer based 
> tracepoint callback (this is used by programmatic plugins and 
> can also by used by generic instrumentation like SystemTap), and 
> it is also used to expose a structured trace record in 
> /debug/tracing/events/.

Thanks, I'll add the above to include/linux/tracepoint.h as a documented 
example.

> 
> Steve: i flipped around TP_fast_assign() and TP_printk() in the 
> example above because it makes more sense that way. (we first 
> assign the record, then we print it out) I suspect we should 
> flip it around in the code too.

OK, I'll make that change too.

-- Steve

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ