linux-kernel - RE: [virtio-dev] Re: [PATCH v2 repost 4/7] virtio-balloon: speed up inflate/deflate process

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <F2CBF3009FA73547804AE4C663CAB28E04214C0B@shsmsx102.ccr.corp.intel.com>
Date:	Fri, 29 Jul 2016 00:46:14 +0000
From:	"Li, Liang Z" <liang.z.li@...el.com>
To:	"Michael S. Tsirkin" <mst@...hat.com>
CC:	"Hansen, Dave" <dave.hansen@...el.com>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
	"virtualization@...ts.linux-foundation.org" 
	<virtualization@...ts.linux-foundation.org>,
	"linux-mm@...ck.org" <linux-mm@...ck.org>,
	"virtio-dev@...ts.oasis-open.org" <virtio-dev@...ts.oasis-open.org>,
	"kvm@...r.kernel.org" <kvm@...r.kernel.org>,
	"qemu-devel@...gnu.org" <qemu-devel@...gnu.org>,
	"dgilbert@...hat.com" <dgilbert@...hat.com>,
	"quintela@...hat.com" <quintela@...hat.com>,
	Andrew Morton <akpm@...ux-foundation.org>,
	"Vlastimil Babka" <vbabka@...e.cz>,
	Mel Gorman <mgorman@...hsingularity.net>,
	"Paolo Bonzini" <pbonzini@...hat.com>,
	Cornelia Huck <cornelia.huck@...ibm.com>,
	Amit Shah <amit.shah@...hat.com>
Subject: RE: [virtio-dev] Re: [PATCH v2 repost 4/7] virtio-balloon: speed up
 inflate/deflate process

> On Thu, Jul 28, 2016 at 06:36:18AM +0000, Li, Liang Z wrote:
> > > > > This ends up doing a 1MB kmalloc() right?  That seems a _bit_ big.
> > > > > How big was the pfn buffer before?
> > > >
> > > > Yes, it is if the max pfn is more than 32GB.
> > > > The size of the pfn buffer use before is 256*4 = 1024 Bytes, it's
> > > > too small, and it's the main reason for bad performance.
> > > > Use the max 1MB kmalloc is a balance between performance and
> > > > flexibility, a large page bitmap covers the range of all the
> > > > memory is no good for a system with huge amount of memory. If the
> > > > bitmap is too small, it means we have to traverse a long list for
> > > > many times, and it's bad
> > > for performance.
> > > >
> > > > Thanks!
> > > > Liang
> > >
> > > There are all your implementation decisions though.
> > >
> > > If guest memory is so fragmented that you only have order 0 4k
> > > pages, then allocating a huge 1M contigious chunk is very problematic in
> and of itself.
> > >
> >
> > The memory is allocated in the probe stage. This will not happen if
> > the driver is  loaded when booting the guest.
> >
> > > Most people rarely migrate and do not care how fast that happens.
> > > Wasting a large chunk of memory (and it's zeroed for no good reason,
> > > so you actually request host memory for it) for everyone to speed it
> > > up when it does happen is not really an option.
> > >
> > If people don't plan to do inflating/deflating, they should not enable
> > the virtio-balloon at the beginning, once they decide to use it, the
> > driver should provide better performance as much as possible.
> 
> The reason people inflate/deflate is so they can overcommit memory.
> Do they need to overcommit very quickly? I don't see why.
> So let's get what we can for free but I don't really believe people would want
> to pay for it.
> 
> > 1MB is a very small portion for a VM with more than 32GB memory and
> > it's the *worst case*, for VM with less than 32GB memory, the amount
> > of RAM depends on VM's memory size and will be less than 1MB.
> 
> It's guest memmory so might all be in swap and never touched, your memset
> at probe time will fault it in and make hypervisor actually pay for it.
> 
> > If 1MB is too big, how about 512K, or 256K?  32K seems too small.
> >
> > Liang
> 
> It's only small because it makes you rescan the free list.
> So maybe you should do something else.
> I looked at it a bit. Instead of scanning the free list, how about scanning actual
> page structures? If page is unused, pass it to host.
> Solves the problem of rescanning multiple times, does it not?
> 

Yes, agree.
> 
> Another idea: allocate a small bitmap at probe time (e.g. for deflate), allocate
> a bunch more on each request. Use something like GFP_ATOMIC and a
> scatter/gather, if that fails use the smaller bitmap.
> 

So, the aim of v3 is to use a smaller bitmap without too heavy performance penalty.
Thanks a lot!

Liang