[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20090615200619.GA10632@Krystal>
Date: Mon, 15 Jun 2009 16:06:19 -0400
From: Mathieu Desnoyers <mathieu.desnoyers@...ymtl.ca>
To: Ingo Molnar <mingo@...e.hu>
Cc: Linus Torvalds <torvalds@...ux-foundation.org>, mingo@...hat.com,
hpa@...or.com, paulus@...ba.org, acme@...hat.com,
linux-kernel@...r.kernel.org, a.p.zijlstra@...llo.nl,
penberg@...helsinki.fi, vegard.nossum@...il.com, efault@....de,
jeremy@...p.org, npiggin@...e.de, tglx@...utronix.de,
linux-tip-commits@...r.kernel.org
Subject: Re: [tip:perfcounters/core] perf_counter: x86: Fix call-chain
support to use NMI-safe methods
* Ingo Molnar (mingo@...e.hu) wrote:
>
> * Linus Torvalds <torvalds@...ux-foundation.org> wrote:
>
> > > If it's faster, this becomes a legit (albeit complex)
> > > micro-optimization in a _very_ hot codepath.
> >
> > I don't think it's all that hot. It's not like it's the return to
> > user mode.
>
> Well i guess it depends. For server apps it is true - syscalls are a
> lot more dominant, MMs are long-running so any startup cost gets
> amortized and pagefaults are avoided.
>
> For something like a kernel build we have 7 times as many pagefaults
> as syscalls:
>
> aldebaran:~/linux/linux> perf stat -- make -j32 >/dev/null
> [...]
> Performance counter stats for 'make -j32':
>
> 1444281.076741 task-clock-msecs # 14.429 CPUs
> 219991 context-switches # 0.000 M/sec
> 18335 CPU-migrations # 0.000 M/sec
> 38465628 page-faults # 0.027 M/sec
> 4374762924204 cycles # 3029.025 M/sec
> 2645979309823 instructions # 0.605 IPC
> 42398991227 cache-references # 29.356 M/sec
> 4371920878 cache-misses # 3.027 M/sec
>
> 100.097787566 seconds time elapsed.
>
> So we have 38465628 page-faults, or one every 68788 instructions,
> one every 113731 cycles.
>
> 10 cycles saved in the page fault costs means 0.01% performance win
> - or about 10 milliseconds shaven off the kernel build time.
>
> 100 cycles saved (which is impossible really in the entry/exit path)
> would mean 0.1% win.
>
> 5653639 syscalls (according to strace -c) - which is a factor of 6.8
> lower. Same goes for shell scripts or most of the clicking we do on
> a GUI.
>
> It's not a big factor for sure.
>
> Btw., the biggest pagefault cost is in the fault handling itself
> (the page clearing):
>
> 4.14% [k] do_page_fault
> 1.20% [k] sys_write
> 1.10% [k] sys_open
> 0.63% [k] sys_exit_group
> 0.48% [k] smp_apic_timer_interrupt
> 0.37% [k] sys_read
> 0.37% [k] sys_execve
> 0.20% [k] sys_mmap
> 0.18% [k] sys_close
> 0.14% [k] sys_munmap
> 0.13% [k] sys_poll
> 0.09% [k] sys_newstat
> 0.07% [k] sys_clone
> 0.06% [k] sys_newfstat
>
> it totals to 4.14% of the total cost (user-space cycles included) of
> a kernel build, on a Nehalem box.
>
> Ingo
In the category "crazy ideas one should never express out loud", I could add the
following. We could choose to save/restore the cr2 register on the local stack
at every interrupt entry/exit, and therefore allow the page fault handler to
execute with interrupts enabled.
I have not benchmarked the interrupt disabling overhead of the page fault
handler handled by starting an interrupt-gated handler rather than trap-gated
handler, but cli/sti instructions are known to take quite a few cycles on some
architectures. e.g. 131 cycles for the pair on P4, 23 cycles on AMD Athlon X2
64, 43 cycles on Intel Core2.
I am tempted to think that taking, say, ~10 cycles on the interrupt path worths
it if we save a few tens of cycles on the page fault handler fast path.
But again, this calls for benchmarks.
Mathieu
--
Mathieu Desnoyers
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists