[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20090619152029.GA7204@elte.hu>
Date: Fri, 19 Jun 2009 17:20:29 +0200
From: Ingo Molnar <mingo@...e.hu>
To: Linus Torvalds <torvalds@...ux-foundation.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@...ymtl.ca>, mingo@...hat.com,
hpa@...or.com, paulus@...ba.org, acme@...hat.com,
linux-kernel@...r.kernel.org, a.p.zijlstra@...llo.nl,
penberg@...helsinki.fi, vegard.nossum@...il.com, efault@....de,
jeremy@...p.org, npiggin@...e.de, tglx@...utronix.de,
linux-tip-commits@...r.kernel.org
Subject: Re: [tip:perfcounters/core] perf_counter: x86: Fix call-chain
support to use NMI-safe methods
* Linus Torvalds <torvalds@...ux-foundation.org> wrote:
> On Mon, 15 Jun 2009, Ingo Molnar wrote:
> >
> > See the numbers in the other mail: about 33 million pagefaults
> > happen in a typical kernel build - that's ~400K/sec - and that
> > is not a particularly really pagefault-heavy workload.
>
> Did you do any function-level profiles?
>
> Last I looked at it, the real cost of page faults were all in the
> memory copies and page clearing, and while it would be nice to
> speed up the kernel entry and exit, the few tens of cycles we
> might be able to get from there really aren't all that important.
Yeah.
Here's the function level profiles of a typical kernel build on a
Nehalem box:
$ perf report --sort symbol
#
# (14317328 samples)
#
# Overhead Symbol
# ........ ......
#
44.05% 0x000000001a0b80
5.09% 0x0000000001d298
3.56% 0x0000000005742c
2.48% 0x0000000014026d
2.31% 0x00000000007b1a
2.06% 0x00000000115ac9
1.83% [.] _int_malloc
1.71% 0x00000000064680
1.50% [.] memset
1.37% 0x00000000125d88
1.28% 0x000000000b7642
1.17% [k] clear_page_c
0.87% [k] page_fault
0.78% [.] is_defined_config
0.71% [.] _int_free
0.68% [.] __GI_strlen
0.66% 0x000000000699e8
0.54% [.] __GI_memcpy
Most is dominated by user-space symbols. (no proper ELF+debuginfo on
this box so they are unnamed.) It also sows that page clearing and
pagefault handling dominates the kernel overhead - but is dwarved by
other overhead. Any page-fault-entry costs are a drop in the bucket.
In fact with call-chain graphs we can get a precise picture, as we
can do a non-linear 'slice' set operation over the samples and
filter out the ones that have the 'page_fault' pattern in one of
their parent functions:
$ perf report --sort symbol --parent page_fault
#
# (14317328 samples)
#
# Overhead Symbol
# ........ ......
#
1.12% [k] clear_page_c
0.87% [k] page_fault
0.43% [k] get_page_from_freelist
0.25% [k] _spin_lock
0.24% [k] do_page_fault
0.23% [k] perf_swcounter_ctx_event
0.16% [k] perf_swcounter_event
0.15% [k] handle_mm_fault
0.15% [k] __alloc_pages_nodemask
0.14% [k] __rmqueue
0.12% [k] find_get_page
0.11% [k] copy_page_c
0.11% [k] find_vma
0.10% [k] _spin_lock_irqsave
0.10% [k] __wake_up_bit
0.09% [k] _spin_unlock_irqrestore
0.09% [k] do_anonymous_page
0.09% [k] __inc_zone_state
This "sub-profile" shows the true summary overhead that 'page_fault'
and all its child functions have. Note that for example clear_page_c
decreased from 1.17% to 1.12%:
1.12% [k] clear_page_c
1.17% [k] clear_page_c
because there's 0.05% of other callers to clear_page_c() that do not
involve page_fault. Those are filtered out via --parent
filtering/matching.
Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists