linux-kernel - Re: [PATCH v2 3/4] ftrace: add max execution time mesurement to workqueue tracer

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20090413212159.GA8514@elte.hu>
Date:	Mon, 13 Apr 2009 23:21:59 +0200
From:	Ingo Molnar <mingo@...e.hu>
To:	Frederic Weisbecker <fweisbec@...il.com>,
	Oleg Nesterov <oleg@...hat.com>,
	Andrew Morton <akpm@...ux-foundation.org>
Cc:	KOSAKI Motohiro <kosaki.motohiro@...fujitsu.com>,
	Zhaolei <zhaolei@...fujitsu.com>,
	Steven Rostedt <rostedt@...dmis.org>,
	Tom Zanussi <tzanussi@...il.com>, linux-kernel@...r.kernel.org
Subject: Re: [PATCH v2 3/4] ftrace: add max execution time mesurement to
	workqueue tracer


(Oleg, Andrew: it's about workqueue tracing design.)

* Frederic Weisbecker <fweisbec@...il.com> wrote:

> >  		if (tsk) {
> > -			seq_printf(s, "%3d %6d     %6u       %s\n", cws->cpu,
> > +			seq_printf(s, "%3d %6d     %6u     %5lu.%06lu"
> > +				   "  %s\n",
> > +				   cws->cpu,
> >  				   atomic_read(&cws->inserted), cws->executed,
> > +				   exec_secs, exec_usec_rem,
> 
> 
> You are measuring the latency from a workqueue thread point of 
> view. While I find the work latency measurement very interesting, 
> I think this patch does it in the wrong place. The _work_ latency 
> point of view seems to me much more rich as an information source.
> 
> There are several reasons for that.
> 
> Indeed this patch is useful for workqueues that receive always the 
> same work to perform so that you can find very easily the guilty 
> worklet. But the sense of this design is lost once we consider the 
> workqueue threads that receive random works. Of course the best 
> example is events/%d One will observe the max latency that 
> happened on event/0 as an exemple but he will only be able to feel 
> a silent FUD because he has no way to find which work caused this 
> max latency.

Expanding the trace view in a per worklet fashion is also useful for 
debugging: sometimes inefficiencies (or hangs) are related to the 
mixing of high-speed worklets with blocking worklets. This is not 
exposed if we stay at the workqueue level only.

> Especially the events/%d latency measurement seems to me very 
> important because a single work from a random driver can propagate 
> its latency all over the system.
> 
> A single work that consumes too much cpu time, waits for long 
> coming events, sleeps too much, tries to take too often contended 
> locks, or whatever... such single work may delay all pending works 
> in the queue and the only max latency for a given workqueue is not 
> helpful to find these culprits.
> 
> Having this max latency snapshot per work and not per workqueue 
> thread would be useful for every kind of workqueue latency 
> instrumentation:
> 
> - workqueues with single works
> - workqueue with random works
> 
> A developer will also be able to measure its own worklet action 
> and find if it takes too much time, even if it isn't the worst 
> worklet in the workqueue to cause latencies.
> 
> The end result would be to have a descending latency sort of works 
> per cpu workqueue threads (or better: per workqueue group).
> 
> What do you think?

Sounds like a good idea to me. It would also allow histograms based 
on worklet identity, etc. Often the most active kevents worklet 
should be considered to be split out as a new workqueue.

And if we have a per worklet tracepoint it would also allow a trace 
filter to only trace a given type of worklet.

	Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/