lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <4C50EF21.1020202@cn.fujitsu.com>
Date:	Thu, 29 Jul 2010 11:01:53 +0800
From:	Li Zefan <lizf@...fujitsu.com>
To:	Ian Munsie <imunsie@....ibm.com>
CC:	linux-kernel@...r.kernel.org, Steven Rostedt <rostedt@...dmis.org>,
	Frederic Weisbecker <fweisbec@...il.com>,
	Ingo Molnar <mingo@...hat.com>,
	Masami Hiramatsu <mhiramat@...hat.com>,
	Jiri Olsa <jolsa@...hat.com>, Tejun Heo <tj@...nel.org>
Subject: Re: [PATCH v2] ftrace: record command lines at more appropriate moment

Ian Munsie wrote:
> From: Ian Munsie <imunsie@....ibm.com>
> 
> Previously, when tracing was activated through debugfs, regardless of
> which tracing plugin (if any) were activated, the probe_sched_switch and
> probe_sched_wakeup probes from the sched_switch plugin would be
> activated. This appears to have been a hack to use them to record the
> command lines of active processes as they were scheduled.
> 
> That approach would suffer if many processes were being scheduled that
> were not generating events as they would consume entries in the
> saved_cmdlines buffer that could otherwise have been used by other
> processes that were actually generating events.
> 
> It also had the problem that events could be mis-attributed - in the
> common situation of a process forking then execing a new process, the
> change of the process command would not be noticed for some time after
> the exec until the process was next scheduled.
> 
> If the trace was read after the fact this would generally go unnoticed
> because at some point the process would be scheduled and the entry in
> the saved_cmdlines buffer would be updated so that the new command would
> be reported when the trace was eventually read. However, if the events
> were being read live (e.g. through trace_pipe), the events just after
> the exec and before the process was next scheduled would show the
> incorrect command (though the PID would be correct).
> 
> This patch removes the sched_switch hack altogether and instead records
> the commands at a more appropriate moment - when a new trace event is
> committed onto the ftrace ring buffer. This means that the recorded
> command line is much more likely to be correct when the trace is read,
> either live or after the fact, so long as the command line still resides
> in the saved_cmdlines buffer.
> 
> It is still not guaranteed to be correct in all situations. For instance
> if the trace is read after the fact rather than live (consider events
> generated by a process before an exec - in the below example they would
> be attributed to sleep rather than stealpid since the entry in
> saved_cmdlines would have changed before the event was read), but this
> is no different to the current situation and the alternative would be to
> store the command line with each and every event.
> 
> terminal 1: grep '\-12345' /sys/kernel/debug/tracing/trace_pipe
> terminal 2: ./stealpid 12345 `which sleep` 0.1
> 
> Before:
>         stealpid-12345 [003]    86.001826: sys_clone -> 0x0
>         stealpid-12345 [003]    86.002013: compat_sys_execve(ufilename: ffaaabef, argv: ffaaa7ec, envp: ffaaa7f8)
>         stealpid-12345 [002]    86.002292: sys_restart_syscall -> 0x0
>         stealpid-12345 [002]    86.002336: sys_brk(brk: 0)
>         stealpid-12345 [002]    86.002338: sys_brk -> 0x1007a000
>         ...
>         stealpid-12345 [002]    86.002582: sys_mmap(addr: 0, len: 1000, prot: 3, flags: 22, fd: ffffffff, offset: 0)
>         stealpid-12345 [002]    86.002586: sys_mmap -> 0xf7c21000
>            sleep-12345 [002]    86.002771: sys_mprotect(start: ffe8000, len: 4000, prot: 1)
>            sleep-12345 [002]    86.002780: sys_mprotect -> 0x0
>            ...
> 
> After:
>         stealpid-12345 [003]  1368.823626: sys_clone -> 0x0
>         stealpid-12345 [003]  1368.823820: compat_sys_execve(ufilename: fffa6bef, argv: fffa5afc, envp: fffa5b08)
>            sleep-12345 [002]  1368.824125: sys_restart_syscall -> 0x0
>            sleep-12345 [002]  1368.824173: sys_brk(brk: 0)
>            sleep-12345 [002]  1368.824175: sys_brk -> 0x104ae000
>            ...
> 
> Signed-off-by: Ian Munsie <imunsie@....ibm.com>

I've tested your patch using lmbench(ctx):

Context switching - times in microseconds - smaller is better
-------------------------------------------------------------------------
Host                 OS  2p/0K 2p/16K 2p/64K 8p/16K 8p/64K 16p/16K 16p/64K
                         ctxsw  ctxsw  ctxsw ctxsw  ctxsw   ctxsw   ctxsw
--------- ------------- ------ ------ ------ ------ ------ ------- -------
(trace-off)
          Linux 2.6.35- 2.1300 2.2100 2.0800 2.5900 2.1400 2.59000 2.19000
          Linux 2.6.35- 2.1400 2.2000 2.0800 2.6000 2.0900 2.56000 2.15000

(all events on)
          Linux 2.6.35- 2.8000 2.9600 2.7200 3.2500 2.8200 3.24000 2.98000
          Linux 2.6.35- 2.7100 2.6900 2.7300 3.2200 2.8500 3.25000 2.79000

(all events on without cmdline-recording)
          Linux 2.6.35- 2.6100 2.6900 2.5800 3.0300 2.5800 3.04000 2.67000
          Linux 2.6.35- 2.5800 2.5900 2.5600 3.0300 2.6600 3.04000 2.61000

(your patch applied)
          Linux 2.6.35- 2.7100 2.8000 2.7200 3.2100 2.8400 3.24000 2.82000
          Linux 2.6.35- 2.6600 2.8400 2.6900 3.1900 2.7600 3.27000 2.78000

So with your patch applied, the performance is still worse than just disabling
cmdline-recording.

The performance may be worse if I choose some other benchmarks.

I'd suggest another approch, that we add a tracepiont in set_task_comm()
to record cmdname, and that's how perf does.

> ---
> 
> Changes since v1 addressing feedback from Li Zefan:
>  * Inline trace_save_cmdline and tracing_record_cmdline for a marginal speed
>    gain when recording command lines.
>  * Move call to tracing_record_cmdline from tracing_generic_entry_update to
>    __trace_buffer_unlock_commit to avoid the overhead when using perf or if the
>    event was filtered out.
> 
>  kernel/trace/trace.c                 |    7 +++----
>  kernel/trace/trace_events.c          |   11 -----------
>  kernel/trace/trace_functions.c       |    2 --
>  kernel/trace/trace_functions_graph.c |    2 --
>  kernel/trace/trace_sched_switch.c    |   10 ----------
>  5 files changed, 3 insertions(+), 29 deletions(-)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ