lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20131002101826.GC7941@localhost.localdomain>
Date:	Wed, 2 Oct 2013 12:18:28 +0200
From:	Frederic Weisbecker <fweisbec@...il.com>
To:	Namhyung Kim <namhyung@...nel.org>
Cc:	Arnaldo Carvalho de Melo <acme@...stprotocols.net>,
	Peter Zijlstra <a.p.zijlstra@...llo.nl>,
	Paul Mackerras <paulus@...ba.org>,
	Ingo Molnar <mingo@...nel.org>,
	Namhyung Kim <namhyung.kim@....com>,
	LKML <linux-kernel@...r.kernel.org>,
	Linus Torvalds <torvalds@...ux-foundation.org>,
	Jiri Olsa <jolsa@...hat.com>
Subject: Re: [PATCH 1/8] perf callchain: Convert children list to rbtree

On Thu, Sep 26, 2013 at 05:58:03PM +0900, Namhyung Kim wrote:
> From: Namhyung Kim <namhyung.kim@....com>
> 
> Current collapse stage has a scalability problem which can be
> reproduced easily with parallel kernel build.  This is because it
> needs to traverse every children of callchain linearly during the
> collapse/merge stage.  Convert it to rbtree reduced the overhead
> significantly.
> 
> On my 400MB perf.data file which recorded with make -j32 kernel build:
> 
>   $ time perf --no-pager report --stdio > /dev/null
> 
> before:
>   real	6m22.073s
>   user	6m18.683s
>   sys	0m0.706s
> 
> after:
>   real	0m20.780s
>   user	0m19.962s
>   sys	0m0.689s
> 
> During the perf report the overhead on append_chain_children went down
> from 96.69% to 18.16%:
> 
>   -  18.16%  perf  perf                [.] append_chain_children
>      - append_chain_children
>         - 77.48% append_chain_children
>            + 69.79% merge_chain_branch
>            - 22.96% append_chain_children
>               + 67.44% merge_chain_branch
>               + 30.15% append_chain_children
>               + 2.41% callchain_append
>            + 7.25% callchain_append
>         + 12.26% callchain_append
>         + 10.22% merge_chain_branch
>   +  11.58%  perf  perf                [.] dso__find_symbol
>   +   8.02%  perf  perf                [.] sort__comm_cmp
>   +   5.48%  perf  libc-2.17.so        [.] malloc_consolidate
> 
> Reported-by: Linus Torvalds <torvalds@...ux-foundation.org>
> Cc: Jiri Olsa <jolsa@...hat.com>
> Cc: Frederic Weisbecker <fweisbec@...il.com>
> Link: http://lkml.kernel.org/n/tip-d9tcfow6stbrp4btvgs51y67@git.kernel.org
> Signed-off-by: Namhyung Kim <namhyung@...nel.org>

Have you tested this patchset when collapsing is not used?
There are fair chances that this patchset does not only improve collapsing
but also callchain insertion in general. So it's probably a win in any case. But
still it would be nice to make sure that it's the case because we are getting
rid of collapsing anyway.

The test that could tell us about that is to run "perf report -s sym" and compare the
time it takes to complete before and after this patch, because "-s sym" shouldn't
involve collapses.

Sorting by anything that is not comm should do the trick in fact.

Thanks.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ