lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-Id: <1368110568-64714-1-git-send-email-Waiman.Long@hp.com>
Date:	Thu,  9 May 2013 10:42:48 -0400
From:	Waiman Long <Waiman.Long@...com>
To:	Arnaldo Carvalho de Melo <acme@...stprotocols.net>,
	Jiri Olsa <jolsa@...hat.com>,
	Stephane Eranian <eranian@...gle.com>,
	Namhyung Kim <namhyung@...nel.org>
Cc:	Waiman Long <Waiman.Long@...com>,
	Peter Zijlstra <a.p.zijlstra@...llo.nl>,
	Paul Mackerras <paulus@...ba.org>,
	Ingo Molnar <mingo@...hat.com>, linux-kernel@...r.kernel.org,
	"Chandramouleeswaran, Aswin" <aswin@...com>,
	"Norton, Scott J" <scott.norton@...com>
Subject: [PATCH v2] perf record: fix symbol processing bug and greatly improve performance

When "perf record" was used on a large machine with a lot of CPUs,
the perf post-processing time (the time after the workload was done
until the perf command itself exited) could take a lot of minutes
and even hours depending on how large the resulting perf.data file was.

While running AIM7 1500-user high_systime workload on a 80-core
x86-64 system with a 3.9 kernel (with only the -s -a options used),
the workload itself took about 2 minutes to run and the perf.data
file had a size of 1108.746 MB. However, the post-processing step
took more than 10 minutes.

With a gprof-profiled perf binary, the time spent by perf was as
follows:

  %   cumulative   self              self     total
 time   seconds   seconds    calls   s/call   s/call  name
 96.90    822.10   822.10   192156     0.00     0.00  dsos__find
  0.81    828.96     6.86 172089958     0.00     0.00  rb_next
  0.41    832.44     3.48 48539289     0.00     0.00  rb_erase

So 97% (822 seconds) of the time was spent in a single dsos_find()
function. After analyzing the call-graph data below:

-----------------------------------------------
                0.00  822.12  192156/192156      map__new [6]
[7]     96.9    0.00  822.12  192156         vdso__dso_findnew [7]
              822.10    0.00  192156/192156      dsos__find [8]
                0.01    0.00  192156/192156      dsos__add [62]
                0.01    0.00  192156/192366      dso__new [61]
                0.00    0.00       1/45282525     memdup [31]
                0.00    0.00  192156/192230      dso__set_long_name [91]
-----------------------------------------------
              822.10    0.00  192156/192156      vdso__dso_findnew [7]
[8]     96.9  822.10    0.00  192156         dsos__find [8]
-----------------------------------------------

It was found that the vdso__dso_findnew() function failed to locate
VDSO__MAP_NAME ("[vdso]") in the dso list and have to insert a new
entry at the end for 192156 times. This problem is due to the fact that
there are 2 types of name in the dso entry - short name and long name.
The initial dso__new() adds "[vdso]" to both the short and long names.
After that, vdso__dso_findnew() modifies the long name to something
like /tmp/perf-vdso.so-NoXkDj. The dsos__find() function only compares
the long name. As a result, the same vdso entry is duplicated many
time in the dso list. This bug increases memory consumption as well
as slows the symbol processing time to a crawl.

To resolve this problem, the dsos__find() function interface was
modified to enable searching either the long name or the short
name. The vdso__dso_findnew() will now search only the short name
while the other call sites search for the long name as before.

With this change, the cpu time of perf was reduced from 848.38s to
15.77s and dsos__find() only accounted for 0.06% of the total time.

  0.06     15.73     0.01   192151     0.00     0.00  dsos__find

Signed-off-by: Waiman Long <Waiman.Long@...com>
---
 tools/perf/util/dso.c  |   10 ++++++++--
 tools/perf/util/dso.h  |    3 ++-
 tools/perf/util/vdso.c |    2 +-
 3 files changed, 11 insertions(+), 4 deletions(-)

diff --git a/tools/perf/util/dso.c b/tools/perf/util/dso.c
index 6f7d5a9..3d80f92 100644
--- a/tools/perf/util/dso.c
+++ b/tools/perf/util/dso.c
@@ -513,10 +513,16 @@ void dsos__add(struct list_head *head, struct dso *dso)
 	list_add_tail(&dso->node, head);
 }
 
-struct dso *dsos__find(struct list_head *head, const char *name)
+struct dso *dsos__find(struct list_head *head, const char *name, bool cmp_short)
 {
 	struct dso *pos;
 
+	if (cmp_short) {
+		list_for_each_entry(pos, head, node)
+			if (strcmp(pos->short_name, name) == 0)
+				return pos;
+		return NULL;
+	}
 	list_for_each_entry(pos, head, node)
 		if (strcmp(pos->long_name, name) == 0)
 			return pos;
@@ -525,7 +531,7 @@ struct dso *dsos__find(struct list_head *head, const char *name)
 
 struct dso *__dsos__findnew(struct list_head *head, const char *name)
 {
-	struct dso *dso = dsos__find(head, name);
+	struct dso *dso = dsos__find(head, name, FALSE);
 
 	if (!dso) {
 		dso = dso__new(name);
diff --git a/tools/perf/util/dso.h b/tools/perf/util/dso.h
index 450199a..d51aaf2 100644
--- a/tools/perf/util/dso.h
+++ b/tools/perf/util/dso.h
@@ -133,7 +133,8 @@ struct dso *dso__kernel_findnew(struct machine *machine, const char *name,
 				const char *short_name, int dso_type);
 
 void dsos__add(struct list_head *head, struct dso *dso);
-struct dso *dsos__find(struct list_head *head, const char *name);
+struct dso *dsos__find(struct list_head *head, const char *name,
+		       bool cmp_short);
 struct dso *__dsos__findnew(struct list_head *head, const char *name);
 bool __dsos__read_build_ids(struct list_head *head, bool with_hits);
 
diff --git a/tools/perf/util/vdso.c b/tools/perf/util/vdso.c
index e60951f..a8fd73d 100644
--- a/tools/perf/util/vdso.c
+++ b/tools/perf/util/vdso.c
@@ -91,7 +91,7 @@ void vdso__exit(void)
 
 struct dso *vdso__dso_findnew(struct list_head *head)
 {
-	struct dso *dso = dsos__find(head, VDSO__MAP_NAME);
+	struct dso *dso = dsos__find(head, VDSO__MAP_NAME, TRUE);
 
 	if (!dso) {
 		char *file;
-- 
1.7.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ