lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <4A95822D.9060207@gmail.com>
Date:	Wed, 26 Aug 2009 14:42:53 -0400
From:	Gregory Haskins <gregory.haskins@...il.com>
To:	Avi Kivity <avi@...hat.com>
CC:	alacrityvm-devel@...ts.sourceforge.net,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
	"kvm@...r.kernel.org" <kvm@...r.kernel.org>,
	"Michael S. Tsirkin" <mst@...hat.com>, netdev@...r.kernel.org
Subject: Re: AlacrityVM benchmark numbers updated

Avi Kivity wrote:
> On 08/26/2009 04:01 AM, Gregory Haskins wrote:
>> We are pleased to announce the availability of the latest networking
>> benchmark numbers for AlacrityVM.  We've made several tweaks to the
>> original v0.1 release to improve performance.  The most notable is a
>> switch from get_user_pages to switch_mm+copy_[to/from]_user thanks to a
>> review suggestion from Michael Tsirkin (as well as his patch to
>> implement it).
>>
>> This change alone accounted for freeing up an additional 1.2Gbps, which
>> is over 25% improvement from v0.1.  The previous numbers were 4560Gbps
>> before the change, and 5708Gbps after (for 1500mtu over 10GE).  This
>> moves us ever closer to the goal of native performance under
>> virtualization.
>>    
> 
> Interesting, it's good to see that copy_*_user() works so well.  Note
> that there's a possible optimization that goes in the opposite direction
> - keep using get_user_pages(), but use the dma engine API to perform the
> actual copy.  I expect that it will only be a win when using tso to
> transfer full pages.  Large pages may also help.
> 
> Copyless tx also wants get_user_pages().  It makes sense to check if
> switch_mm() + get_user_pages_fast() gives better performance than
> get_user_pages().

Actually, I have already look at this and it does indeed seem better to
use switch_mm+gupf() over gup() by quite a large margin.  You could then
couple that with your DMA-engine idea to potentially gain even more
benefits (though probably not for networking since most NICs have their
own DMA engine anyway).

Kind Regards,
-Greg



Download attachment "signature.asc" of type "application/pgp-signature" (268 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ