lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Thu, 9 Dec 2010 11:16:39 -0500
From:	Mathieu Desnoyers <mathieu.desnoyers@...icios.com>
To:	Frederic Weisbecker <fweisbec@...il.com>
Cc:	Steven Rostedt <rostedt@...dmis.org>,
	David Sharp <dhsharp@...gle.com>, linux-kernel@...r.kernel.org,
	mrubin@...gle.com
Subject: Re: [PATCH 13/15] small_traces: Add config option to shrink trace
	events.

* Frederic Weisbecker (fweisbec@...il.com) wrote:
> On Thu, Dec 09, 2010 at 10:08:17AM -0500, Steven Rostedt wrote:
> > On Thu, 2010-12-09 at 15:55 +0100, Frederic Weisbecker wrote:
> > > On Fri, Dec 03, 2010 at 09:54:12PM -0500, Steven Rostedt wrote:
> > > > On Fri, 2010-12-03 at 18:33 -0800, David Sharp wrote:
> > > > > I considered that, and I generally thing it's a good idea. However, I
> > > > > also want to use this switch to shrink individual tracepoint event
> > > > > structures.
> > > > > 
> > > > > eg: sched switch is a high frequency event and it is 68 bytes (60
> > > > > after these patches)
> > > > > 
> > > > > Can you suggest a syntax for TRACE_EVENT, DECLARE_EVENT_CLASS, etc,
> > > > > that could express the two versions and produce the right code?
> > > > > 
> > > > > I'm worried about adding even further complexity to the TRACE_EVENT
> > > > > macros. I could add TRACE_EVENT_SMALL that takes two versions of
> > > > > TP_STRUCT__entry, TP_fast_assign, and TP_printk each, but then this
> > > > > will need to be permuted with your TP_CONDITIONAL patches as well.
> > > > 
> > > > I would not touch the TRACE_EVENT() structures. They are there as is and
> > > > I would not think about changing them. Something like that would never
> > > > make it into mainline.
> > > > 
> > > > Now what you can do, is to make your own events based off of the same
> > > > tracepoints. For example, the TRACE_EVENT(sched_switch...) has in
> > > > sched.c:
> > > > 
> > > > 	trace_sched_switch(prev, next);
> > > > 
> > > > 
> > > > You could even write a module that does something like this:
> > > > 
> > > > 	register_trace_sched_switch(probe_sched_switch, mydata);
> > > > 
> > > > 
> > > > 
> > > > void probe_sched_switch(void *mydata,
> > > > 		struct task_struct *prev,
> > > > 		struct task_struct *next)
> > > > {
> > > > 	struct ring_buffer *buffer;
> > > > 	struct ring_buffer_event *event;
> > > > 	struct myentry *entry;
> > > > 
> > > > 	event = trace_current_buffer_lock_reserve(buffer,
> > > > 			mytype, sizeof(*entry),
> > > > 			0, 0);
> > > > 
> > > > 	if (!event)
> > > > 		return;
> > > > 
> > > > 	entry = ring_buffer_event_data(event);
> > > > 
> > > > 	entry->myfield = prev->x;
> > > > 	...
> > > > 
> > > > 	trace_nowake_buffer_unlock_commit(buffer, event,
> > > > 					0, 0);
> > > > }
> > > > 
> > > > You will need to do a register_ftrace_event() to register that 'mytype'
> > > > and how to output it. Otherwise it would just be ignored in the "trace"
> > > > file.
> > > > 
> > > > All of the above would work fine as a loadable module that you could
> > > > easily maintain out of tree, and still uses the internals of the system.
> > > > 
> > > > -- Steve
> > > > 
> > > > 
> > > 
> > > 
> > > But this would improve only google's tracing while this is a general
> > > mainline tracing problem.
> > > 
> > > The first thing is that we need to get rid of the lock_depth field, the bkl
> > > is dying.
> > 
> > Yeah that needs to go :-)
> > 
> > > 
> > > For the rest what about having a bitmap of the fields we want to ignore,
> > > which can be setup from a trace file for ftrace and as an ioctl for perf.
> > > 
> > > So this bitmap is easy to implement on the common fields.
> > > 
> > > For the rest, one could choose between using TP_fast_assign()
> > > and TP_cond_assign().
> > > 
> > > TP_fast_assign() stays as is and doesn't implement bitmap field
> > > ignoring. Those who want conditional record will need
> > > TP_cond_assign().
> > > Well, unfortunately this probably requires us to play
> > > the same trickery than SYSCALL_DEFINE() in that we'll probably
> > > need TP_cond_assign1(), TP_cond_assign2(), TP_cond_assign3(), etc...
> > > 
> > > #define TP_cond_assignx(nr, assign)	\
> > > 	if (call->bitmask & nr)	{	\
> > > 		assign
> > > 	}
> > > 
> > > #define TP_cond_assign2(nr, assign, ...)	\
> > > 	TP_cond_assignx(nr, assign)		\
> > > 	TP_cond_assign1(nr + 1, __VA_ARGS__)
> > > 
> > > #define TP_cond_assign3(nr, assign, ...)        \
> > >         TP_cond_assignx(nr, assign)             \
> > >         TP_cond_assign2(nr + 1, __VA_ARGS__)
> > > 
> > > That will also require a bit more trickery to dynamically
> > > pre-compute the size of the trace entry.
> > 
> > Mathieu is working on encapsulating the assignments in their own macros.
> > 
> > Instead of doing:
> > 
> > 	__entry->foo = bar;
> > 
> > We will have:
> > 
> > 	tp_assign(foo, bar);
> > 
> > This way we could probably use this to dynamically figure out what to
> > assign.
> > 
> > -- Steve
> > 
> > 
> 
> Yep, it should also work that way.

In an incredible exercise of good timing, I'm planning to post them just now :)

Comments will be very welcome,

Thanks!

Mathieu

-- 
Mathieu Desnoyers
Operating System Efficiency R&D Consultant
EfficiOS Inc.
http://www.efficios.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ