[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <1280319410-18120-1-git-send-email-imunsie@au1.ibm.com>
Date: Wed, 28 Jul 2010 22:16:49 +1000
From: "Ian Munsie" <imunsie@....ibm.com>
To: linux-kernel@...r.kernel.org
Cc: Ian Munsie <imunsie@....ibm.com>,
Steven Rostedt <rostedt@...dmis.org>,
Frederic Weisbecker <fweisbec@...il.com>,
Ingo Molnar <mingo@...hat.com>, Li Zefan <lizf@...fujitsu.com>,
Masami Hiramatsu <mhiramat@...hat.com>,
Jiri Olsa <jolsa@...hat.com>, Tejun Heo <tj@...nel.org>
Subject: [PATCH v2] ftrace: record command lines at more appropriate moment
From: Ian Munsie <imunsie@....ibm.com>
Previously, when tracing was activated through debugfs, regardless of
which tracing plugin (if any) were activated, the probe_sched_switch and
probe_sched_wakeup probes from the sched_switch plugin would be
activated. This appears to have been a hack to use them to record the
command lines of active processes as they were scheduled.
That approach would suffer if many processes were being scheduled that
were not generating events as they would consume entries in the
saved_cmdlines buffer that could otherwise have been used by other
processes that were actually generating events.
It also had the problem that events could be mis-attributed - in the
common situation of a process forking then execing a new process, the
change of the process command would not be noticed for some time after
the exec until the process was next scheduled.
If the trace was read after the fact this would generally go unnoticed
because at some point the process would be scheduled and the entry in
the saved_cmdlines buffer would be updated so that the new command would
be reported when the trace was eventually read. However, if the events
were being read live (e.g. through trace_pipe), the events just after
the exec and before the process was next scheduled would show the
incorrect command (though the PID would be correct).
This patch removes the sched_switch hack altogether and instead records
the commands at a more appropriate moment - when a new trace event is
committed onto the ftrace ring buffer. This means that the recorded
command line is much more likely to be correct when the trace is read,
either live or after the fact, so long as the command line still resides
in the saved_cmdlines buffer.
It is still not guaranteed to be correct in all situations. For instance
if the trace is read after the fact rather than live (consider events
generated by a process before an exec - in the below example they would
be attributed to sleep rather than stealpid since the entry in
saved_cmdlines would have changed before the event was read), but this
is no different to the current situation and the alternative would be to
store the command line with each and every event.
terminal 1: grep '\-12345' /sys/kernel/debug/tracing/trace_pipe
terminal 2: ./stealpid 12345 `which sleep` 0.1
Before:
stealpid-12345 [003] 86.001826: sys_clone -> 0x0
stealpid-12345 [003] 86.002013: compat_sys_execve(ufilename: ffaaabef, argv: ffaaa7ec, envp: ffaaa7f8)
stealpid-12345 [002] 86.002292: sys_restart_syscall -> 0x0
stealpid-12345 [002] 86.002336: sys_brk(brk: 0)
stealpid-12345 [002] 86.002338: sys_brk -> 0x1007a000
...
stealpid-12345 [002] 86.002582: sys_mmap(addr: 0, len: 1000, prot: 3, flags: 22, fd: ffffffff, offset: 0)
stealpid-12345 [002] 86.002586: sys_mmap -> 0xf7c21000
sleep-12345 [002] 86.002771: sys_mprotect(start: ffe8000, len: 4000, prot: 1)
sleep-12345 [002] 86.002780: sys_mprotect -> 0x0
...
After:
stealpid-12345 [003] 1368.823626: sys_clone -> 0x0
stealpid-12345 [003] 1368.823820: compat_sys_execve(ufilename: fffa6bef, argv: fffa5afc, envp: fffa5b08)
sleep-12345 [002] 1368.824125: sys_restart_syscall -> 0x0
sleep-12345 [002] 1368.824173: sys_brk(brk: 0)
sleep-12345 [002] 1368.824175: sys_brk -> 0x104ae000
...
Signed-off-by: Ian Munsie <imunsie@....ibm.com>
---
Changes since v1 addressing feedback from Li Zefan:
* Inline trace_save_cmdline and tracing_record_cmdline for a marginal speed
gain when recording command lines.
* Move call to tracing_record_cmdline from tracing_generic_entry_update to
__trace_buffer_unlock_commit to avoid the overhead when using perf or if the
event was filtered out.
kernel/trace/trace.c | 7 +++----
kernel/trace/trace_events.c | 11 -----------
kernel/trace/trace_functions.c | 2 --
kernel/trace/trace_functions_graph.c | 2 --
kernel/trace/trace_sched_switch.c | 10 ----------
5 files changed, 3 insertions(+), 29 deletions(-)
diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index 4b1122d..dfbc62c 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -1023,9 +1023,7 @@ void tracing_stop(void)
spin_unlock_irqrestore(&tracing_start_lock, flags);
}
-void trace_stop_cmdline_recording(void);
-
-static void trace_save_cmdline(struct task_struct *tsk)
+static inline void trace_save_cmdline(struct task_struct *tsk)
{
unsigned pid, idx;
@@ -1097,7 +1095,7 @@ void trace_find_cmdline(int pid, char comm[])
preempt_enable();
}
-void tracing_record_cmdline(struct task_struct *tsk)
+inline void tracing_record_cmdline(struct task_struct *tsk)
{
if (atomic_read(&trace_record_cmdline_disabled) || !tracer_enabled ||
!tracing_is_on())
@@ -1152,6 +1150,7 @@ __trace_buffer_unlock_commit(struct ring_buffer *buffer,
unsigned long flags, int pc,
int wake)
{
+ tracing_record_cmdline(current);
ring_buffer_unlock_commit(buffer, event);
ftrace_trace_stack(buffer, flags, 6, pc);
diff --git a/kernel/trace/trace_events.c b/kernel/trace/trace_events.c
index 09b4fa6..2c0cf38 100644
--- a/kernel/trace/trace_events.c
+++ b/kernel/trace/trace_events.c
@@ -180,10 +180,8 @@ void trace_event_enable_cmd_record(bool enable)
continue;
if (enable) {
- tracing_start_cmdline_record();
call->flags |= TRACE_EVENT_FL_RECORDED_CMD;
} else {
- tracing_stop_cmdline_record();
call->flags &= ~TRACE_EVENT_FL_RECORDED_CMD;
}
}
@@ -199,22 +197,13 @@ static int ftrace_event_enable_disable(struct ftrace_event_call *call,
case 0:
if (call->flags & TRACE_EVENT_FL_ENABLED) {
call->flags &= ~TRACE_EVENT_FL_ENABLED;
- if (call->flags & TRACE_EVENT_FL_RECORDED_CMD) {
- tracing_stop_cmdline_record();
- call->flags &= ~TRACE_EVENT_FL_RECORDED_CMD;
- }
call->class->reg(call, TRACE_REG_UNREGISTER);
}
break;
case 1:
if (!(call->flags & TRACE_EVENT_FL_ENABLED)) {
- if (trace_flags & TRACE_ITER_RECORD_CMD) {
- tracing_start_cmdline_record();
- call->flags |= TRACE_EVENT_FL_RECORDED_CMD;
- }
ret = call->class->reg(call, TRACE_REG_REGISTER);
if (ret) {
- tracing_stop_cmdline_record();
pr_info("event trace: Could not enable event "
"%s\n", call->name);
break;
diff --git a/kernel/trace/trace_functions.c b/kernel/trace/trace_functions.c
index 16aee4d..0800e91 100644
--- a/kernel/trace/trace_functions.c
+++ b/kernel/trace/trace_functions.c
@@ -31,7 +31,6 @@ static int function_trace_init(struct trace_array *tr)
tr->cpu = get_cpu();
put_cpu();
- tracing_start_cmdline_record();
tracing_start_function_trace();
return 0;
}
@@ -39,7 +38,6 @@ static int function_trace_init(struct trace_array *tr)
static void function_trace_reset(struct trace_array *tr)
{
tracing_stop_function_trace();
- tracing_stop_cmdline_record();
}
static void function_trace_start(struct trace_array *tr)
diff --git a/kernel/trace/trace_functions_graph.c b/kernel/trace/trace_functions_graph.c
index 6bff236..4bb8811 100644
--- a/kernel/trace/trace_functions_graph.c
+++ b/kernel/trace/trace_functions_graph.c
@@ -321,14 +321,12 @@ static int graph_trace_init(struct trace_array *tr)
&trace_graph_entry);
if (ret)
return ret;
- tracing_start_cmdline_record();
return 0;
}
static void graph_trace_reset(struct trace_array *tr)
{
- tracing_stop_cmdline_record();
unregister_ftrace_graph();
}
diff --git a/kernel/trace/trace_sched_switch.c b/kernel/trace/trace_sched_switch.c
index 8f758d0..9a5cb2c 100644
--- a/kernel/trace/trace_sched_switch.c
+++ b/kernel/trace/trace_sched_switch.c
@@ -190,16 +190,6 @@ static void tracing_stop_sched_switch(void)
mutex_unlock(&sched_register_mutex);
}
-void tracing_start_cmdline_record(void)
-{
- tracing_start_sched_switch();
-}
-
-void tracing_stop_cmdline_record(void)
-{
- tracing_stop_sched_switch();
-}
-
/**
* tracing_start_sched_switch_record - start tracing context switches
*
--
1.7.1
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists