[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <497EDF43.2030406@goop.org>
Date: Tue, 27 Jan 2009 02:17:39 -0800
From: Jeremy Fitzhardinge <jeremy@...p.org>
To: Ingo Molnar <mingo@...e.hu>
CC: Zachary Amsden <zach@...are.com>, Nick Piggin <npiggin@...e.de>,
Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
Linus Torvalds <torvalds@...ux-foundation.org>,
"hpa@...or.com" <hpa@...or.com>,
"jeremy@...source.com" <jeremy@...source.com>,
"chrisw@...s-sol.org" <chrisw@...s-sol.org>,
"rusty@...tcorp.com.au" <rusty@...tcorp.com.au>,
Andrew Morton <akpm@...ux-foundation.org>,
Xen-devel <xen-devel@...ts.xensource.com>
Subject: Re: lmbench lat_mmap slowdown with CONFIG_PARAVIRT
Ingo Molnar wrote:
> ping?
>
> This is a very serious paravirt_ops slowdown affecting the native kernel's
> performance to the tune of 5-10% in certain workloads.
>
> It's been about 2 years ago that paravirt_ops went upstream, when you told
> us that something like this would never happen, that paravirt_ops is
> designed so flexibly that it will never hinder the native kernel - and if
> it does it will be easy to fix it. Now is the time to fulfill that
> promise.
I couldn't exactly reproduce your results, but I guess they're similar
in shape. Comparing 2.6.29-rc2-nopv with -pvops, I saw this ratio (pass
1-5). Interestingly I'm seeing identical instruction counts for pvops
vs non-pvops, and a lower cycle count. The cache references are way up
and the miss rate is up a bit, which I guess is the source of the slowdown.
With the attached patch, I get a clear improvement; it replaces the
do-nothing pte_val/make_pte functions with inlined movs to move the
argument to return, overpatching the 6-byte indirect call (on i386 it
would just be all nopped out). CPU cycles and cache misses are way
down, and the tick count is down from ~5% worse to ~2%. But the cache
reference rate is even higher, which really doesn't make sense to me.
But the patch is a clear improvement, and its hard to see how it could
make anything worse (its always going to replace an indirect call with
simple inlined code).
(Full numbers in spreadsheet.)
I have a couple of other patches to reduce the register pressure of the
pvops calls, but I'm trying to work out how to make sure its not all to
complex and/or fragile.
J
Download attachment "pvops-mmap-measurements.ods" of type "application/vnd.oasis.opendocument.spreadsheet" (30546 bytes)
View attachment "paravirt-ident.patch" of type "text/plain" (6904 bytes)
Powered by blists - more mailing lists