lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20090413161649.GH5977@nowhere>
Date:	Mon, 13 Apr 2009 18:16:51 +0200
From:	Frederic Weisbecker <fweisbec@...il.com>
To:	KOSAKI Motohiro <kosaki.motohiro@...fujitsu.com>
Cc:	Zhaolei <zhaolei@...fujitsu.com>,
	Steven Rostedt <rostedt@...dmis.org>,
	Tom Zanussi <tzanussi@...il.com>, Ingo Molnar <mingo@...e.hu>,
	linux-kernel@...r.kernel.org
Subject: Re: [PATCH v2 3/4] ftrace: add max execution time mesurement to
	workqueue tracer

On Mon, Apr 13, 2009 at 02:53:45PM +0900, KOSAKI Motohiro wrote:
> Subject: [PATCH] ftrace: add max execution time mesurement to workqueue tracer
> 
> Too slow workqueue handler often introduce some trouble to system.
> Then, administrator want to know may handler execution time.
> 
> 
> Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@...fujitsu.com>
> Cc: Zhaolei <zhaolei@...fujitsu.com>
> Cc: Steven Rostedt <rostedt@...dmis.org>
> Cc: Frederic Weisbecker <fweisbec@...il.com>
> Cc: Tom Zanussi <tzanussi@...il.com>
> Cc: Ingo Molnar <mingo@...e.hu>
> ---
>  kernel/trace/trace_workqueue.c |   48 +++++++++++++++++++++++++++++++++++++----
>  1 file changed, 44 insertions(+), 4 deletions(-)
> 
> Index: b/kernel/trace/trace_workqueue.c
> ===================================================================
> --- a/kernel/trace/trace_workqueue.c	2009-04-13 13:23:59.000000000 +0900
> +++ b/kernel/trace/trace_workqueue.c	2009-04-13 13:59:51.000000000 +0900
> @@ -27,6 +27,10 @@ struct cpu_workqueue_stats {
>   *  on a single CPU.
>   */
>  	unsigned int		    executed;
> +
> +	/* store handler execution time */
> +	u64			    handler_start_time;
> +	u64			    max_executed_time;
>  };



Imho, this is the wrong place to instrument.
I explain my point of view below.



>  /* List of workqueue threads on one cpu */
> @@ -77,6 +81,7 @@ probe_workqueue_entry(struct task_struct
>  							list) {
>  		if (node->pid == wq_thread->pid) {
>  			node->executed++;
> +			node->handler_start_time = trace_clock_global();
>  			goto found;
>  		}
>  	}
> @@ -85,6 +90,29 @@ found:
>  	spin_unlock_irqrestore(&workqueue_cpu_stat(cpu)->lock, flags);
>  }
>  
> +static void probe_workqueue_exit(struct task_struct *wq_thread,
> +				 struct work_struct *work)
> +{
> +	int cpu = cpumask_first(&wq_thread->cpus_allowed);
> +	struct cpu_workqueue_stats *node, *next;
> +	unsigned long flags;
> +
> +	spin_lock_irqsave(&workqueue_cpu_stat(cpu)->lock, flags);
> +	list_for_each_entry_safe(node, next, &workqueue_cpu_stat(cpu)->list,
> +							list) {


Do you need the safe version here? You don't seem to remove any entry.

Sidenote: only the workqueue destruction handler might need it if I'm
not wrong.
I placed some of them in other places in this file because I misunderstood the
kernel list concepts in the past :)
(Heh, and probably still today).



> +		if (node->pid == wq_thread->pid) {
> +			u64 start = node->handler_start_time;
> +			u64 executed_time = trace_clock_global() - start;
> +
> +			if (node->max_executed_time < executed_time)
> +				node->max_executed_time = executed_time;
> +			goto found;
> +		}
> +	}
> +found:
> +	spin_unlock_irqrestore(&workqueue_cpu_stat(cpu)->lock, flags);
> +}
> +
>  /* Creation of a cpu workqueue thread */
>  static void probe_workqueue_creation(struct task_struct *wq_thread, int cpu)
>  {
> @@ -195,6 +223,9 @@ static int workqueue_stat_show(struct se
>  	int cpu = cws->cpu;
>  	struct pid *pid;
>  	struct task_struct *tsk;
> +	unsigned long long exec_time = ns2usecs(cws->max_executed_time);
> +	unsigned long exec_usec_rem = do_div(exec_time, USEC_PER_SEC);
> +	unsigned long exec_secs = (unsigned long)exec_time;
>  
>  	spin_lock_irqsave(&workqueue_cpu_stat(cpu)->lock, flags);
>  	if (&cws->list == workqueue_cpu_stat(cpu)->list.next)
> @@ -205,8 +236,11 @@ static int workqueue_stat_show(struct se
>  	if (pid) {
>  		tsk = get_pid_task(pid, PIDTYPE_PID);
>  		if (tsk) {
> -			seq_printf(s, "%3d %6d     %6u       %s\n", cws->cpu,
> +			seq_printf(s, "%3d %6d     %6u     %5lu.%06lu"
> +				   "  %s\n",
> +				   cws->cpu,
>  				   atomic_read(&cws->inserted), cws->executed,
> +				   exec_secs, exec_usec_rem,




You are measuring the latency from a workqueue thread point of view.
While I find the work latency measurement very interesting,
I think this patch does it in the wrong place. The _work_ latency point of view
seems to me much more rich as an information source.

There are several reasons for that.

Indeed this patch is useful for workqueues that receive always the same work
to perform so that you can find very easily the guilty worklet.
But the sense of this design is lost once we consider the workqueue threads
that receive random works. Of course the best example is events/%d
One will observe the max latency that happened on event/0 as an exemple but
he will only be able to feel a silent FUD because he has no way to find
which work caused this max latency.


Especially the events/%d latency measurement seems to me very important
because a single work from a random driver can propagate its latency
all over the system.

A single work that consumes too much cpu time, waits for long coming
events, sleeps too much, tries to take too often contended locks, or
whatever... such single work may delay all pending works in the queue
and the only max latency for a given workqueue is not helpful to find
these culprits.

Having this max latency snapshot per work and not per workqueue thread
would be useful for every kind of workqueue latency instrumentation:

- workqueues with single works
- workqueue with random works

A developer will also be able to measure its own worklet action and
find if it takes too much time, even if it isn't the worst worklet in
the workqueue to cause latencies.

The end result would be to have a descending latency sort of works
per cpu workqueue threads (or better: per workqueue group).

What do you think?

Frederic.



>  				   tsk->comm);
>  			put_task_struct(tsk);
>  		}
> @@ -218,8 +252,8 @@ static int workqueue_stat_show(struct se
>  
>  static int workqueue_stat_headers(struct seq_file *s)
>  {
> -	seq_printf(s, "# CPU  INSERTED  EXECUTED   NAME\n");
> -	seq_printf(s, "# |      |         |          |\n");
> +	seq_printf(s, "# CPU  INSERTED  EXECUTED     MAX_TIME  NAME\n");
> +	seq_printf(s, "# |      |         |           |         |\n");
>  	return 0;
>  }
>  
> @@ -259,10 +293,14 @@ int __init trace_workqueue_early_init(vo
>  	if (ret)
>  		goto no_insertion;
>  
> -	ret = register_trace_workqueue_creation(probe_workqueue_creation);
> +	ret = register_trace_workqueue_handler_exit(probe_workqueue_exit);
>  	if (ret)
>  		goto no_handler_entry;
>  
> +	ret = register_trace_workqueue_creation(probe_workqueue_creation);
> +	if (ret)
> +		goto no_handler_exit;
> +
>  	ret = register_trace_workqueue_destruction(probe_workqueue_destruction);
>  	if (ret)
>  		goto no_creation;
> @@ -276,6 +314,8 @@ int __init trace_workqueue_early_init(vo
>  
>  no_creation:
>  	unregister_trace_workqueue_creation(probe_workqueue_creation);
> +no_handler_exit:
> +	unregister_trace_workqueue_handler_exit(probe_workqueue_exit);
>  no_handler_entry:
>  	unregister_trace_workqueue_handler_entry(probe_workqueue_entry);
>  no_insertion:
> 
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ