[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20090413212159.GA8514@elte.hu>
Date: Mon, 13 Apr 2009 23:21:59 +0200
From: Ingo Molnar <mingo@...e.hu>
To: Frederic Weisbecker <fweisbec@...il.com>,
Oleg Nesterov <oleg@...hat.com>,
Andrew Morton <akpm@...ux-foundation.org>
Cc: KOSAKI Motohiro <kosaki.motohiro@...fujitsu.com>,
Zhaolei <zhaolei@...fujitsu.com>,
Steven Rostedt <rostedt@...dmis.org>,
Tom Zanussi <tzanussi@...il.com>, linux-kernel@...r.kernel.org
Subject: Re: [PATCH v2 3/4] ftrace: add max execution time mesurement to
workqueue tracer
(Oleg, Andrew: it's about workqueue tracing design.)
* Frederic Weisbecker <fweisbec@...il.com> wrote:
> > if (tsk) {
> > - seq_printf(s, "%3d %6d %6u %s\n", cws->cpu,
> > + seq_printf(s, "%3d %6d %6u %5lu.%06lu"
> > + " %s\n",
> > + cws->cpu,
> > atomic_read(&cws->inserted), cws->executed,
> > + exec_secs, exec_usec_rem,
>
>
> You are measuring the latency from a workqueue thread point of
> view. While I find the work latency measurement very interesting,
> I think this patch does it in the wrong place. The _work_ latency
> point of view seems to me much more rich as an information source.
>
> There are several reasons for that.
>
> Indeed this patch is useful for workqueues that receive always the
> same work to perform so that you can find very easily the guilty
> worklet. But the sense of this design is lost once we consider the
> workqueue threads that receive random works. Of course the best
> example is events/%d One will observe the max latency that
> happened on event/0 as an exemple but he will only be able to feel
> a silent FUD because he has no way to find which work caused this
> max latency.
Expanding the trace view in a per worklet fashion is also useful for
debugging: sometimes inefficiencies (or hangs) are related to the
mixing of high-speed worklets with blocking worklets. This is not
exposed if we stay at the workqueue level only.
> Especially the events/%d latency measurement seems to me very
> important because a single work from a random driver can propagate
> its latency all over the system.
>
> A single work that consumes too much cpu time, waits for long
> coming events, sleeps too much, tries to take too often contended
> locks, or whatever... such single work may delay all pending works
> in the queue and the only max latency for a given workqueue is not
> helpful to find these culprits.
>
> Having this max latency snapshot per work and not per workqueue
> thread would be useful for every kind of workqueue latency
> instrumentation:
>
> - workqueues with single works
> - workqueue with random works
>
> A developer will also be able to measure its own worklet action
> and find if it takes too much time, even if it isn't the worst
> worklet in the workqueue to cause latencies.
>
> The end result would be to have a descending latency sort of works
> per cpu workqueue threads (or better: per workqueue group).
>
> What do you think?
Sounds like a good idea to me. It would also allow histograms based
on worklet identity, etc. Often the most active kevents worklet
should be considered to be split out as a new workqueue.
And if we have a per worklet tracepoint it would also allow a trace
filter to only trace a given type of worklet.
Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists