linux-kernel - Re: [RFC patch 00/41] LTTng 0.105 core for Linux 2.6.27-rc9

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20090313161838.GA2925@Krystal>
Date:	Fri, 13 Mar 2009 12:18:38 -0400
From:	Mathieu Desnoyers <mathieu.desnoyers@...ymtl.ca>
To:	Ingo Molnar <mingo@...e.hu>
Cc:	Linus Torvalds <torvalds@...ux-foundation.org>,
	linux-kernel@...r.kernel.org,
	Andrew Morton <akpm@...ux-foundation.org>,
	Steven Rostedt <rostedt@...dmis.org>,
	ltt-dev@...ts.casi.polymtl.ca,
	Peter Zijlstra <peterz@...radead.org>,
	Frederic Weisbecker <fweisbec@...il.com>,
	Arjan van de Ven <arjan@...radead.org>,
	Pekka Paalanen <pq@....fi>,
	Arnaldo Carvalho de Melo <acme@...hat.com>,
	"H. Peter Anvin" <hpa@...or.com>, Martin Bligh <mbligh@...gle.com>,
	"Frank Ch. Eigler" <fche@...hat.com>,
	Tom Zanussi <tzanussi@...il.com>,
	Masami Hiramatsu <mhiramat@...hat.com>,
	KOSAKI Motohiro <kosaki.motohiro@...fujitsu.com>,
	Jason Baron <jbaron@...hat.com>,
	Christoph Hellwig <hch@...radead.org>,
	Jiaying Zhang <jiayingz@...gle.com>,
	Eduard - Gabriel Munteanu <eduard.munteanu@...ux360.ro>,
	mrubin@...gle.com, md@...gle.com
Subject: Re: [RFC patch 00/41] LTTng 0.105 core for Linux 2.6.27-rc9

* Ingo Molnar (mingo@...e.hu) wrote:
> 
> * Mathieu Desnoyers <mathieu.desnoyers@...ymtl.ca> wrote:
> 
> > > Let me give you a few examples of existing areas of overlap:
> > > 
> > > > The corresponding git tree contains also the trace clock 
> > > > patches and the lttng instrumentation. The trace clock is 
> > > > required to use the tracer, but it can be used without the 
> > > > instrumentation : there is already a kprobes and userspace 
> > > > event support included in this patchset.
> > > 
> > > The latest tracing tree includes 
> > > kernel/tracing/trace_clock.c which offers three trace clock 
> > > variants, with different performance/precision tradeoffs:
> > > 
> > >  trace_clock_local()   [ for pure CPU-local tracers with no idle 
> > >                          events. This is the fastest but least 
> > >                          coherent tracing clock. ]
> > > 
> > >  trace_clock()         [ intermediate, scalable clock with
> > >                          usable but imprecise global coherency. ]
> > > 
> > >  trace_clock_global()  [ globally serialized, coherent clock. 
> > >                          It is the slowest but most accurate variant. ]
> > > 
> > > Tracing plugins can pick their choice. (This is relatively new 
> > > code but you get the idea.)
> > > 
> > 
> > Hehe this reminds me of the trace clock thread I started a few 
> > months ago on LKML. So you guys took over that work ? Nice :) 
> > Is it based on the trace-clock patches I proposed back then ? 
> > Ah, no. Well I guess we'll have to discuss this too. I agree 
> > on the trace_clock_local/trace_clock/trace_clock_global 
> > interface, it looks nice. The underlying implementation will 
> > have to be discussed though.
> 
> Beware: i found the assembly trace_clock() stuff you did back 
> then rather ugly ;-) I dont think there's any easy solutions 
> here, so i went for this palette of clocks.
> 

Hi Ingo,

I agree for the palette of clocks to fit all needs. I wonder what
exactly you found ugly in the approach I took with my trace_clock()
implementation ? Maybe you could refresh my memory, I do not recall
writing any part of it in assembly.. ? But this is a whole different
topic. We can discuss this later.


> > > This approach works for all your other patches as well. A 
> > > direct, constructive comparison and active work on unifying 
> > > them is required.
> > 
> > Yes, let's try to do it. Maybe it's better to start a new 
> > thread with less CCs for this type of work ?
> 
> Yeah. More finegrained steps are really needed.
> 
> The least controversial bits would be the many tracepoints you 
> identified in LTTng as interesting. Mind sending them separately 
> so that we can make some progress?
> 

OK, I'll work on it. Note however that I flipped my patchset around in
the past months : thinking that the tracer acceptance would be easier
than tracepoints. And now we are back at square one. Is it just me or I
have the funny feeling of acting like a dog running in circles after his
tail ? :)


> In the latest tracing code all tracepoints will show up 
> automatically under /debug/tracing/events/ and can be used by 
> user-space tools.
> 

Hrm, the thing is : I strongly disagree with showing tracepoints to
userspace and with the fact that you embed the data serialization
"pseudo-callbacks" into the tracepoint headers. Here is why. Peter
Zijlstra convinced me that putting format strings directly in tracepoint
headers was a bad idea. First off, you end up requiring all tracers
which connect on the tracepoints to choose your event format description
if they ever want to benefit from it. It's a "all included" formula :
either the tracers use them or they cannot output "standard" trace
information.

Second point : the tracepoints are meant to be tied to the kernel
source. Putting those event descriptions in global headers seems like
the people responsible for writing the kernel code surrounding the
tracepoints will end up being responsible for updating those tracepoint
event format descriptions. I think this is an unacceptable maintainance
burden for the whole community. Only tracer-specific modules should
refuse to build whenever it does not match the inner kernel structures
anymore.

Third point : it's plainly ugly. If we look at your tracepoint example :


/*
 * Tracepoint for task switches, performed by the scheduler:
 *
 * (NOTE: the 'rq' argument is not used by generic trace events,
 *        but used by the latency tracer plugin. )
 */
TRACE_EVENT(sched_switch,

        TP_PROTO(struct rq *rq, struct task_struct *prev,
                 struct task_struct *next),

        TP_ARGS(rq, prev, next),

        TP_STRUCT__entry(
                __array(        char,   prev_comm,      TASK_COMM_LEN   )
                __field(        pid_t,  prev_pid                        )
                __field(        int,    prev_prio                       )
                __array(        char,   next_comm,      TASK_COMM_LEN   )
                __field(        pid_t,  next_pid                        )
                __field(        int,    next_prio                       )
        ),

        TP_printk("task %s:%d [%d] ==> %s:%d [%d]",
                __entry->prev_comm, __entry->prev_pid, __entry->prev_prio,
                __entry->next_comm, __entry->next_pid, __entry->next_prio),

        TP_fast_assign(
                memcpy(__entry->next_comm, next->comm, TASK_COMM_LEN);
                __entry->prev_pid       = prev->pid;
                __entry->prev_prio      = prev->prio;
                memcpy(__entry->prev_comm, prev->comm, TASK_COMM_LEN);
                __entry->next_pid       = next->pid;
                __entry->next_prio      = next->prio;
        )
);


I notice that you actually embed the "function" that converts between
the format string into a header macro declaration. Why don't we write
this in plain C ?

in include/trace/sched.h :

DECLARE_TRACE(sched_switch,
        TPPROTO(struct rq *rq, struct task_struct *prev,
                struct task_struct *next),
                TPARGS(rq, prev, next));

in ltt/probes/kernel-trace.c :


void probe_sched_switch(struct rq *rq, struct task_struct *prev,
                struct task_struct *next);

DEFINE_MARKER_TP(kernel, sched_schedule, sched_switch, probe_sched_switch,
        "prev_pid %d next_pid %d prev_state #2d%ld");

notrace void probe_sched_switch(struct rq *rq, struct task_struct *prev,
                struct task_struct *next)
{
        struct marker *marker;
        struct serialize_int_int_short data;

        data.f1 = prev->pid;
        data.f2 = next->pid;
        data.f3 = prev->state;

        marker = &GET_MARKER(kernel, sched_schedule);
        ltt_specialized_trace(marker, marker->single.probe_private,
                &data, serialize_sizeof(data), sizeof(int));
}

This way, if the content of task_struct ever changes, only the tracer
module will break, not code touching a global header.

Mathieu

-- 
Mathieu Desnoyers
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/