[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20150421184446.7b63cbec@gandalf.local.home>
Date: Tue, 21 Apr 2015 18:44:46 -0400
From: Steven Rostedt <rostedt@...dmis.org>
To: Ron Rechenmacher <ron@...l.gov>
Cc: Christoph Hellwig <hch@...radead.org>,
<linux-kernel@...r.kernel.org>,
Arnaldo Carvalho de Melo <acme@...nel.org>,
Arjan van de Ven <arjan@...ux.intel.com>,
Namhyung Kim <namhyung@...nel.org>
Subject: Re: [PATCH] tracing: Export key trace event symbols
[ Added a bunch of people that use perf ;-) ]
On Tue, 21 Apr 2015 17:23:45 -0500
Ron Rechenmacher <ron@...l.gov> wrote:
>
> Steven Rostedt wrote on 04/21/15 10:49:
> [snip]
> >
> > OK, lets see what you are doing in your patch:
>
> Thanks for helping/discussing!
>
> >
> > +/*
> > + * Allow modules to register additional trace routines
> > + */
> > +EXPORT_TRACEPOINT_SYMBOL_GPL(irq_handler_entry);
> > +EXPORT_TRACEPOINT_SYMBOL_GPL(irq_handler_exit);
> > +
> >
> > Instead of just saying "Allow modules to register additional trace
> > routines", please explain what its used for. What are you measuring
> > externally? Just the timings of the interrupts?
>
> How about "Allow for module (internal or external) to register trace function which
> shows (or measures) interactions with user-space executables more directly. Knowing
> when interrupts happen and how they can subtly impact user-space application timing
> is important in some application."
>
> >
> > I'm also curious to know why ftrace isn't good enough.
>
> ftrace (if one considers ltt-ng) is probably capable of doing what my trace does
> except that the timestamp is not TOD (Time Of Day) -- but probably/maybe could
> be made to do so???
>
> It's a matter of complexity or "start up cost" (or entry barrier).
>
> I'm not the best at articulating all there is to TRACE succinctly, but lets
> start with the fact that the user-space component of TRACE is basically just a header
> file that defines the TRACE printf-like macro and handles the initialization which is
> basically mmap-ing a file (as a trace buffer (with control structure)) and is,
> for the most part, Unix OS independent.
Heh, that sounds like the perf interface. Have you thought about using
that instead?
Look at what PowerTop does: https://01.org/powertop
You can use the perf interface to mmap a memory buffer and read the
data in its raw format. To parse the data, you need to read the format
files in /sys/kernel/debug/tracing/events, but there's a library that
does that. Unfortunately, that library isn't in distros yet (we really
need to get that done). But you can just copy the code from the kernel
source tree from tools/lib/traceevents/ and use that as perf does (as
well as powertop does too).
The format files tells you how to get the data from the raw binary
format.
If you use the perf interface, you can have userspace tools do custom
things with the events as they happen. This may make your utility more
robust in the long run.
I'd be happy to help you understand how to use the traceevent library. I
don't use the perf interface, but you can get examples from the
PowerTop code.
-- Steve
>
> The TRACE module, in addition to registering TRACE compatible tracing functions,
> also creates a virtual file that the user-space TRACE-enable program(s) can mmap just
> like another regular file (it doesn't know it's a kernel virtual file).
> [As a side point, it's just for convenience that one or more user-space programs are
> configured (via env. var.) to use the kernel virtual file as the buffer; they could use
> any other regular file or each program could use a separate file.]
>
> The module becomes critical when tracing OS key interactions (scheduling and interrupts)
> with the userspace application(s) is necessary.
>
>
> >
> > Maybe the better solution is to add something to ftrace that does what
> > your tracer currently does?
>
> Although there is significant overlap between my TRACE and ftrace, the key
> design criteria is such that trying to make ftrace do what is desired of TRACE
> would be impractical. Ftrace is more of a developer tool while TRACE is more of
> an end-user utility, where simplicity of use is paramount.
>
> Thanks for your help,
> Ron
>
> PS. Below, I've included a portion of a print out from the user-space TRACE control
> program printing the buffer and showing the exact impact that an interrupt
> has on a program looping. Just a contrived example.
>
> >
> > -- Steve
> >
> >>
> >> Thanks for you consideration and if there is specific help/suggestion you can offer to
> >> get my patch accepted, please let me know.
> >>
> >> Thanks again for the discussion.
> /home/ron/work/tracePrj/trace
> cluck :^) TRACE_SHOW=HxNTiICLR
> --04/21_16:42:24--
> /home/ron/work/tracePrj/trace
> cluck :^) tshow 10000 | tdelta -stats | { for xx in 1 2;do read h; echo "$h";done;tail;}
> idx us_tod delta tid TID cpu lv r msg
> ---- ---------------- --------- ----- --- --- -- - -----------------------------
> 9996 1429652195997224 1 40744 18 17 0 . Hello 9996. "c 2.5 5 5000000000 15" should be repeated here: c 2.5 5
> 5000000000 15
> 9997 1429652195997223 1 40744 18 17 0 . Hello 9997. "c 2.5 5 5000000000 15" should be repeated here: c 2.5 5
> 5000000000 15
> 9998 1429652195997222 1 40744 18 17 0 . Hello 9998. "c 2.5 5 5000000000 15" should be repeated here: c 2.5 5
> 5000000000 15
> 9999 1429652195997221 1 40744 18 17 0 . Hello 9999. "c 2.5 5 5000000000 15" should be repeated here: c 2.5 5
> 5000000000 15
> cpu="0"
> min -13
> max 17
> tot 7872
> ave 0.7872787
> cnt 9999
> --04/21_16:42:27--
> /home/ron/work/tracePrj/trace
> cluck :^) tshow 10000 | tdelta -stats | { for xx in 1 2;do read h; echo "$h";done;grep -C15 '2 17';}
> idx us_tod delta tid TID cpu lv r msg
> ---- ---------------- --------- ----- --- --- -- - -----------------------------
> 9438 1429652195997676 1 40744 18 17 0 . Hello 9470. "c 2.5 5 5000000000 15" should be repeated here: c 2.5 5
> 5000000000 15
> 9439 1429652195997676 0 40744 18 17 0 . Hello 9471. "c 2.5 5 5000000000 15" should be repeated here: c 2.5 5
> 5000000000 15
> 9440 1429652195997675 1 40744 18 17 0 . Hello 9472. "c 2.5 5 5000000000 15" should be repeated here: c 2.5 5
> 5000000000 15
> 9441 1429652195997674 1 40744 18 17 0 . Hello 9473. "c 2.5 5 5000000000 15" should be repeated here: c 2.5 5
> 5000000000 15
> 9442 1429652195997673 1 40744 18 17 0 . Hello 9474. "c 2.5 5 5000000000 15" should be repeated here: c 2.5 5
> 5000000000 15
> 9443 1429652195997672 1 40744 18 17 0 . Hello 9475. "c 2.5 5 5000000000 15" should be repeated here: c 2.5 5
> 5000000000 15
> 9444 1429652195997672 0 40744 18 17 0 . Hello 9476. "c 2.5 5 5000000000 15" should be repeated here: c 2.5 5
> 5000000000 15
> 9445 1429652195997671 1 40744 18 17 0 . Hello 9477. "c 2.5 5 5000000000 15" should be repeated here: c 2.5 5
> 5000000000 15
> 9446 1429652195997670 1 40744 18 17 0 . Hello 9478. "c 2.5 5 5000000000 15" should be repeated here: c 2.5 5
> 5000000000 15
> 9447 1429652195997667 3 40744 0 17 28 . sirqexit: cpu=17 vec_nr=9
> 9448 1429652195997664 3 40744 0 17 27 . sirqenter: cpu=17 vec_nr=9
> 9449 1429652195997662 2 40744 0 17 28 . sirqexit: cpu=17 vec_nr=1
> 9450 1429652195997660 2 40744 0 17 27 . sirqenter: cpu=17 vec_nr=1
> 9451 1429652195997656 4 0 0 13 28 . sirqexit: cpu=13 vec_nr=9
> 9452 1429652195997669 -13 40744 18 17 0 . Hello 9479. "c 2.5 5 5000000000 15" should be repeated here: c 2.5 5
> 5000000000 15
> 9453 1429652195997652 17 0 0 13 27 1 sirqenter: cpu=13 vec_nr=9
> 9454 1429652195997652 0 40744 18 17 0 . Hello 9480. "c 2.5 5 5000000000 15" should be repeated here: c 2.5 5
> 5000000000 15
> 9455 1429652195997651 1 40744 18 17 0 . Hello 9481. "c 2.5 5 5000000000 15" should be repeated here: c 2.5 5
> 5000000000 15
> 9456 1429652195997649 2 0 0 13 28 1 sirqexit: cpu=13 vec_nr=1
> 9457 1429652195997649 0 40744 18 17 0 . Hello 9482. "c 2.5 5 5000000000 15" should be repeated here: c 2.5 5
> 5000000000 15
> 9458 1429652195997648 1 0 0 12 28 . sirqexit: cpu=12 vec_nr=9
> 9459 1429652195997648 0 40744 18 17 0 . Hello 9483. "c 2.5 5 5000000000 15" should be repeated here: c 2.5 5
> 5000000000 15
> 9460 1429652195997647 1 40744 18 17 0 . Hello 9484. "c 2.5 5 5000000000 15" should be repeated here: c 2.5 5
> 5000000000 15
> 9461 1429652195997646 1 40744 18 17 0 . Hello 9485. "c 2.5 5 5000000000 15" should be repeated here: c 2.5 5
> 5000000000 15
> 9462 1429652195997645 1 0 0 13 27 1 sirqenter: cpu=13 vec_nr=1
> 9463 1429652195997645 0 40744 18 17 0 1 Hello 9486. "c 2.5 5 5000000000 15" should be repeated here: c 2.5 5
> 5000000000 15
> 9464 1429652195997644 1 0 0 12 27 . sirqenter: cpu=12 vec_nr=9
> 9465 1429652195997643 1 40744 18 17 0 . Hello 9487. "c 2.5 5 5000000000 15" should be repeated here: c 2.5 5
> 5000000000 15
> 9466 1429652195997642 1 40744 18 17 0 . Hello 9488. "c 2.5 5 5000000000 15" should be repeated here: c 2.5 5
> 5000000000 15
> 9467 1429652195997642 0 0 0 12 28 . sirqexit: cpu=12 vec_nr=1
> 9468 1429652195997641 1 40744 18 17 0 . Hello 9489. "c 2.5 5 5000000000 15" should be repeated here: c 2.5 5
> 5000000000 15
> --04/21_17:15:52--
>
>
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists