linux-kernel - Re: lmbench lat_mmap slowdown with CONFIG

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Date:	Tue, 27 Jan 2009 02:17:39 -0800
From:	Jeremy Fitzhardinge <jeremy@...p.org>
To:	Ingo Molnar <mingo@...e.hu>
CC:	Zachary Amsden <zach@...are.com>, Nick Piggin <npiggin@...e.de>,
	Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
	Linus Torvalds <torvalds@...ux-foundation.org>,
	"hpa@...or.com" <hpa@...or.com>,
	"jeremy@...source.com" <jeremy@...source.com>,
	"chrisw@...s-sol.org" <chrisw@...s-sol.org>,
	"rusty@...tcorp.com.au" <rusty@...tcorp.com.au>,
	Andrew Morton <akpm@...ux-foundation.org>,
	Xen-devel <xen-devel@...ts.xensource.com>
Subject: Re: lmbench lat_mmap slowdown with CONFIG_PARAVIRT

Ingo Molnar wrote:
> ping?
>
> This is a very serious paravirt_ops slowdown affecting the native kernel's 
> performance to the tune of 5-10% in certain workloads.
>
> It's been about 2 years ago that paravirt_ops went upstream, when you told 
> us that something like this would never happen, that paravirt_ops is 
> designed so flexibly that it will never hinder the native kernel - and if 
> it does it will be easy to fix it. Now is the time to fulfill that 
> promise.

I couldn't exactly reproduce your results, but I guess they're similar 
in shape.  Comparing 2.6.29-rc2-nopv with -pvops, I saw this ratio (pass 
1-5).  Interestingly I'm seeing identical instruction counts for pvops 
vs non-pvops, and a lower cycle count.  The cache references are way up 
and the miss rate is up a bit, which I guess is the source of the slowdown.

With the attached patch, I get a clear improvement; it replaces the 
do-nothing pte_val/make_pte functions with inlined movs to move the 
argument to return, overpatching the 6-byte indirect call (on i386 it 
would just be all nopped out).  CPU cycles and cache misses are way 
down, and the tick count is down from ~5% worse to ~2%.  But the cache 
reference rate is even higher, which really doesn't make sense to me. 
But the patch is a clear improvement, and its hard to see how it could 
make anything worse (its always going to replace an indirect call with 
simple inlined code).

(Full numbers in spreadsheet.)

I have a couple of other patches to reduce the register pressure of the 
pvops calls, but I'm trying to work out how to make sure its not all to 
complex and/or fragile.

    J

Download attachment "pvops-mmap-measurements.ods" of type "application/vnd.oasis.opendocument.spreadsheet" (30546 bytes)

View attachment "paravirt-ident.patch" of type "text/plain" (6904 bytes)