linux-kernel - Re: Kernel marker has no performance impact on ia64.

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <1213280823.31518.114.camel@twins>
Date:	Thu, 12 Jun 2008 16:27:03 +0200
From:	Peter Zijlstra <peterz@...radead.org>
To:	Mathieu Desnoyers <mathieu.desnoyers@...ymtl.ca>
Cc:	Hideo AOKI <haoki@...hat.com>, mingo@...e.hu,
	Masami Hiramatsu <mhiramat@...hat.com>,
	linux-kernel@...r.kernel.org, Steven Rostedt <rostedt@...dmis.org>,
	"Frank Ch. Eigler" <fche@...hat.com>
Subject: Re: Kernel marker has no performance impact on ia64.

On Thu, 2008-06-12 at 09:53 -0400, Mathieu Desnoyers wrote:
> Hi Peter,
> 
> * Peter Zijlstra (peterz@...radead.org) wrote:
> > On Wed, 2008-06-04 at 19:22 -0400, Mathieu Desnoyers wrote:
> > > * Peter Zijlstra (peterz@...radead.org) wrote:
> > 
> > > > So are you proposing something like:
> > > > 
> > > > static inline void 
> > > > trace_sched_switch(struct task_struct *prev, struct task_struct *next)
> > > > {
> > > > 	trace_mark(sched_switch, prev, next);
> > > > }
> > > > 
> > > 
> > > Not exactly. Something more along the lines of
> > > 
> > > static inline void 
> > > trace_sched_switch(struct task_struct *prev, struct task_struct *next)
> > > {
> > >   /* Internal tracers. */
> > >   ftrace_sched_switch(prev, next);
> > >   othertracer_sched_switch(prev, next);
> > >   /*
> > >    * System-wide tracing. Useful information is exported here.
> > >    * Probes connecting to these markers are expected to only use the
> > >    * information provided to them for data collection purpose. Type
> > >    * casting pointers is discouraged.
> > >    */
> > > 	trace_mark(kernel_sched_switch, "prev_pid %d next_pid %d prev_state %ld",
> > >     prev->pid, next->pid, prev->state);
> > > }
> > 
> > Advantage of my method would be that ftrace (and othertracer) can use
> > the same marker and doesn't need yet another hoook.
> > 
> 
> Am I correct by saying that the method you propose completely removes
> type checking between the instrumentation site and what the probes
> expect ? If yes, this seems to be too fragile. Every time a marker would
> change, one would have to audit _every_ probes, both in-kernel and in
> modules. Adding type checking to the marker infrastructure makes
> automatic detection of these changes possible.

would be as simple as:

 git grep sched_switch

every time someone changes trace_sched_switch() arguments. Doesn't seem
too hard, you could even make checkpatch remind you to do that if it
sees a change to a trace_* function.

The down-side of runtime type checking (of which Masami's proposal is
the best so far), is that you'll still not find the breakage until
someone actually tries to use a tracer - so you'll still need the above.

> > > > dropping the silly fmt string but using the multiplex of trace_mark, and
> > > > then doing the stringify bit:
> > > > 
> > > >        "prev_pid %d next_pid %d prev_state %ld\n"
> > > > 
> > > > in the actual tracer?
> > > > 
> > > 
> > > It would make much more sense to put this formatting information along
> > > with the trace point (e.g. in a a kernel/sched-trace.h header) rather
> > > that to hide it in a tracer (loadable module) because this information
> > > is an interface to the trace point.
> > 
> > I'm not sure - it seems to me it should be part of the tracer because
> > its a detail/subset of the actual data - rendering it useless for others
> > who'd like a different set.
> > 
> 
> If it ends up elsewhere, then we have to ensure type correctness in some
> way.

Sure, idealy we'd want compile time type safety. We'd want
trace_sched_switch()'s arguments to match trace_sched_switch_handler()'s
arguments, and a compile time error if this is not the case.

However - we cannot seem to get that. Runtime type safety just doesn't
help this case.

But the point I was making here is that:

  trace_sched_switch(prev->pid, next->pid, next->state)

could be useless for some other tracer who'd want:

  trace_sched_switch(prev->vruntime, next->vruntime)

Also, the ->pid stuff isn't even alive on the normal code path, so
adding that to the marker also bloats the code generated there.

So by using the marker:

  trace_sched_switch(prev, next)

We can have various tracers that display different information and avoid
livelyness issues.

> > > > IMHO the 'type safety' of the fmt string is over-rated, since it cannot
> > > > distinguish between a task_struct * or a bio *, both are a pointers -
> > > > and half arsed type safely is worse than no type safety.
> > > > 
> > > 
> > > I totally agree with you that not having the capacity to inspect pointer
> > > types is a problem for tracers which wants to receive the "raw" pointer
> > > and deal with the data they need like big boys. On the other hand, it
> > > requires them to be closely tied to the kernel internals and therefore
> > > it makes sense to call them directly from the tracing site, thus
> > > bypassing the marker format string.
> > > 
> > > However, letting the marker specify the data format so a tracer could
> > > format it into a memory buffer (in a binary or text format, depending on
> > > the implementation) or so that a tool like systemtap can use this
> > > identified information without having to be closely tied to the kernel
> > > makes sense to me.
> > 
> > So s-tap is meant to parse this sting and interpret the varargs without
> > being closely tied to the kernel? - Somehow that doesn't make me feel
> > warm and fuzzy. That not only ties userspace to the information present
> > in the marker, but to the actual string as well.
> > 
> > The stronger you make this bind the less I like it.
> > 
> 
> Well, the string contains each field name and type. Therefore, SystemTAP
> can hook on a marker and parse the string looking for some elements by
> passing a NULL format string upon probe registration. Alternatively, it
> can provide the exact format string expected when it registers its probe
> to the marker and a check will be done to verify that the format string
> passed along with the registered probe matches the marker format string.

Yes, I get that, its one of the ugliest things I've met in this whole
marker story. Why can't stap not insert a normal trace handler that
extracts the information from prev/next it wants?

> Also, about what you said earlier in this thread :
> "Regular trace points can be custom made; this has the advantages that
> it raises the implementation barrier and hopefully that encourages some
> thought in the process. It also avoid the code from growing into
> something that looks like someone had a long night of debugging."
> 
> Before it has been moved to the markers, LTTng was once designed with
> custom-made code to save the trace information through custom hooks. To
> help maintainers instrument their own subsystem and do the right choice
> without being a tracing expert,

>  we created a code generator which
> generated this custom code for each trace point given a description of
> the trace points.

>  It turned out that keeping this duplicate list of
> trace points was cumbersome and that the generated code did eat a lot of
> instruction cache. 

Well, your last proposal of static inline functions basically returns
thereto. So what was cumbersome about it?

The I$ issue is unfortunate indeed - but it seems to be the price to pay
for compile time type safety.

As for that code-generator, that seems a sane idea, esp if the input
file is simply a regular C header file with trace point definitions.

> This is why to turned to markers, so we could re-use
> a common infrastructure to serialize the data into trace buffers. We
> turned to the marker format string to allow the types to serialize to be
> parsed efficiently by the tracer. I strongly recommend not to declare
> the types associated with a kernel trace point in two unrelated
> locations without type checking in-between them (e.g. trace_mark in
> kernel code, string in the tracer module), because it would then become
> harder to track consistency when the code changes.

I see the value of trace_mark() in debugging sessions, but merging these
things is like merging the resulting code file after a printk debugging
session.

> However, I would not be against an hybrid of Masami's proposal and
> current markers, which I will propose in reply to his email.

Ah - I'm looking forward..

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/