lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Mon, 20 Jun 2016 10:47:25 +0000
From:	Wang Nan <wangnan0@...wei.com>
To:	<acme@...nel.org>
CC:	<linux-kernel@...r.kernel.org>, <pi3orama@....com>,
	Wang Nan <wangnan0@...wei.com>, He Kuang <hekuang@...wei.com>,
	"Arnaldo Carvalho de Melo" <acme@...hat.com>,
	Jiri Olsa <jolsa@...nel.org>,
	Masami Hiramatsu <mhiramat@...nel.org>,
	Namhyung Kim <namhyung@...nel.org>,
	Zefan Li <lizefan@...wei.com>
Subject: [PATCH v8 8/8] perf tools: Add --tail-synthesize option

When working with overwritable ring buffer there's a inconvenience
problem: if perf dumps data after a long period after it starts, non-sample
events may lost, which makes following 'perf report' unable to identify
proc name and mmap layout. For example:

 # perf record -m 4 -e raw_syscalls:* -g --overwrite --switch-output \
        dd if=/dev/zero of=/dev/null

send SIGUSR2 after dd runs long enough. The resuling perf.data lost
correct comm and mmap events:

 # perf script -i perf.data.2016061522374354
 perf 24478 [004] 2581325.601789:  raw_syscalls:sys_exit: NR 0 = 512
 ^^^^
 Should be 'dd'
                   27b2e8 syscall_slow_exit_work+0xfe2000e3 (/lib/modules/4.6.0-rc3+/build/vmlinux)
                   203cc7 do_syscall_64+0xfe200117 (/lib/modules/4.6.0-rc3+/build/vmlinux)
                   b18d83 return_from_SYSCALL_64+0xfe200000 (/lib/modules/4.6.0-rc3+/build/vmlinux)
             7f47c417edf0 [unknown] ([unknown])
             ^^^^^^^^^^^^
             Fail to unwind

This patch provides a '--tail-synthesize' option, allows perf to collect
system status when finalizing output file. In resuling output file, the
non-sample events reflect system status when dumping data.

After this patch:
 # perf record -m 4 -e raw_syscalls:* -g --overwrite --switch-output --tail-synthesize \
        dd if=/dev/zero of=/dev/null

 # perf script -i perf.data.2016061600544998
 dd 27364 [004] 2583244.994464: raw_syscalls:sys_enter: NR 1 (1, ...
 ^^
 Correct comm
                   203a18 syscall_trace_enter_phase2+0xfe2001a8 ([kernel.kallsyms])
                   203aa5 syscall_trace_enter+0xfe200055 ([kernel.kallsyms])
                   203caa do_syscall_64+0xfe2000fa ([kernel.kallsyms])
                   b18d83 return_from_SYSCALL_64+0xfe200000 ([kernel.kallsyms])
                    d8e50 __GI___libc_write+0xffff01d9639f4010 (/tmp/oxygen_root-w00229757/lib64/libc-2.18.so)
                    ^^^^^
                    Correct unwind

This option doesn't aim to solve this problem completely. If a process
terminates before SIGUSR2, we still lost its COMM and MMAP events. For
example, we can't unwind correctly from the final perf.data we get from
the previous example, because when perf collects the final output file
(when we press C-c), the 'dd' is already terminated its
'/proc/<pid>/mmap' becomes empty. However this is a cheaper choice. To
complete solve this problem we need to continously output non-sample
events. To satisify the requirement of daemonization, we need to merge
them periodically. It is possible but requires much more code and cycles.

Automatically select --tail-synthesize when --overwrite is provided.

Signed-off-by: Wang Nan <wangnan0@...wei.com>
Cc: He Kuang <hekuang@...wei.com>
Cc: Arnaldo Carvalho de Melo <acme@...hat.com>
Cc: Jiri Olsa <jolsa@...nel.org>
Cc: Masami Hiramatsu <mhiramat@...nel.org>
Cc: Namhyung Kim <namhyung@...nel.org>
Cc: Zefan Li <lizefan@...wei.com>
Cc: pi3orama@....com
---
 tools/perf/Documentation/perf-record.txt |  8 ++++++++
 tools/perf/builtin-record.c              | 31 +++++++++++++++++++++++++------
 tools/perf/perf.h                        |  1 +
 3 files changed, 34 insertions(+), 6 deletions(-)

diff --git a/tools/perf/Documentation/perf-record.txt b/tools/perf/Documentation/perf-record.txt
index f5cb932..d552baf 100644
--- a/tools/perf/Documentation/perf-record.txt
+++ b/tools/perf/Documentation/perf-record.txt
@@ -360,6 +360,12 @@ particular perf.data snapshot should be kept or not.
 
 Implies --timestamp-filename, --no-buildid and --no-buildid-cache.
 
+--tail-synthesize::
+Instead of collecting non-sample events (for example, fork, comm, mmap) at
+the beginning of record, collect them during finalizing an output file.
+The collected non-sample events reflects the status of the system when
+record is finished.
+
 --overwrite::
 Makes all events use overwritable ring buffer. Event with overwritable ring
 buffer works like a flight recorder: when buffer gets full, instead of dumping
@@ -374,6 +380,8 @@ something unusual is detected.
 'overwrite' attribute can also be set or canceled for specific event using
 config terms like 'cycles/overwrite/' and 'instructions/no-overwrite/'.
 
+Implies --tail-synthesize.
+
 SEE ALSO
 --------
 linkperf:perf-stat[1], linkperf:perf-list[1]
diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
index bb62882..7b95444 100644
--- a/tools/perf/builtin-record.c
+++ b/tools/perf/builtin-record.c
@@ -724,13 +724,16 @@ record__finish_output(struct record *rec)
 	return;
 }
 
-static int record__synthesize_workload(struct record *rec)
+static int record__synthesize_workload(struct record *rec, bool tail)
 {
 	struct {
 		struct thread_map map;
 		struct thread_map_data map_data;
 	} thread_map;
 
+	if (rec->opts.tail_synthesize != tail)
+		return 0;
+
 	thread_map.map.nr = 1;
 	thread_map.map.map[0].pid = rec->evlist->workload.pid;
 	thread_map.map.map[0].comm = NULL;
@@ -741,7 +744,7 @@ static int record__synthesize_workload(struct record *rec)
 						 rec->opts.proc_map_timeout);
 }
 
-static int record__synthesize(struct record *rec);
+static int record__synthesize(struct record *rec, bool tail);
 
 static int
 record__switch_output(struct record *rec, bool at_exit)
@@ -752,6 +755,10 @@ record__switch_output(struct record *rec, bool at_exit)
 	/* Same Size:      "2015122520103046"*/
 	char timestamp[] = "InvalidTimestamp";
 
+	record__synthesize(rec, true);
+	if (target__none(&rec->opts.target))
+		record__synthesize_workload(rec, true);
+
 	rec->samples = 0;
 	record__finish_output(rec);
 	err = fetch_current_timestamp(timestamp, sizeof(timestamp));
@@ -774,7 +781,7 @@ record__switch_output(struct record *rec, bool at_exit)
 
 	/* Output tracking events */
 	if (!at_exit) {
-		record__synthesize(rec);
+		record__synthesize(rec, false);
 
 		/*
 		 * In 'perf record --switch-output' without -a,
@@ -786,7 +793,7 @@ record__switch_output(struct record *rec, bool at_exit)
 		 * perf_event__synthesize_thread_map() for those events.
 		 */
 		if (target__none(&rec->opts.target))
-			record__synthesize_workload(rec);
+			record__synthesize_workload(rec, false);
 	}
 	return fd;
 }
@@ -841,7 +848,7 @@ static const struct perf_event_mmap_page *record__pick_pc(struct record *rec)
 	return NULL;
 }
 
-static int record__synthesize(struct record *rec)
+static int record__synthesize(struct record *rec, bool tail)
 {
 	struct perf_session *session = rec->session;
 	struct machine *machine = &session->machines.host;
@@ -851,6 +858,9 @@ static int record__synthesize(struct record *rec)
 	int fd = perf_data_file__fd(file);
 	int err = 0;
 
+	if (rec->opts.tail_synthesize != tail)
+		return 0;
+
 	if (file->is_pipe) {
 		err = perf_event__synthesize_attrs(tool, session,
 						   process_synthesized_event);
@@ -1014,7 +1024,7 @@ static int __cmd_record(struct record *rec, int argc, const char **argv)
 
 	machine = &session->machines.host;
 
-	err = record__synthesize(rec);
+	err = record__synthesize(rec, false);
 	if (err < 0)
 		goto out_child;
 
@@ -1179,6 +1189,9 @@ static int __cmd_record(struct record *rec, int argc, const char **argv)
 	if (!quiet)
 		fprintf(stderr, "[ perf record: Woken up %ld times to write data ]\n", waking);
 
+	if (target__none(&rec->opts.target))
+		record__synthesize_workload(rec, true);
+
 out_child:
 	if (forks) {
 		int exit_status;
@@ -1197,6 +1210,7 @@ out_child:
 	} else
 		status = err;
 
+	record__synthesize(rec, true);
 	/* this will be recalculated during process_buildids() */
 	rec->samples = 0;
 
@@ -1520,6 +1534,8 @@ struct option __record_options[] = {
 	OPT_BOOLEAN_SET('i', "no-inherit", &record.opts.no_inherit,
 			&record.opts.no_inherit_set,
 			"child tasks do not inherit counters"),
+	OPT_BOOLEAN(0, "tail-synthesize", &record.opts.tail_synthesize,
+		    "synthesize non-sample events at the end of output"),
 	OPT_BOOLEAN(0, "overwrite", &record.opts.overwrite, "use overwrite mode"),
 	OPT_UINTEGER('F', "freq", &record.opts.user_freq, "profile at this frequency"),
 	OPT_CALLBACK('m', "mmap-pages", &record.opts, "pages[,pages]",
@@ -1726,6 +1742,9 @@ int cmd_record(int argc, const char **argv, const char *prefix __maybe_unused)
 		}
 	}
 
+	if (record.opts.overwrite)
+		record.opts.tail_synthesize = true;
+
 	if (rec->evlist->nr_entries == 0 &&
 	    perf_evlist__add_default(rec->evlist) < 0) {
 		pr_err("Not enough memory for event selector list\n");
diff --git a/tools/perf/perf.h b/tools/perf/perf.h
index 608b42b..a7e0f14 100644
--- a/tools/perf/perf.h
+++ b/tools/perf/perf.h
@@ -59,6 +59,7 @@ struct record_opts {
 	bool	     record_switch_events;
 	bool	     all_kernel;
 	bool	     all_user;
+	bool	     tail_synthesize;
 	bool	     overwrite;
 	unsigned int freq;
 	unsigned int mmap_pages;
-- 
1.8.3.4

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ