linux-kernel - Re: [PATCH v34 2/4] virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20180627065637-mutt-send-email-mst@kernel.org>
Date:   Wed, 27 Jun 2018 06:58:18 +0300
From:   "Michael S. Tsirkin" <mst@...hat.com>
To:     Wei Wang <wei.w.wang@...el.com>
Cc:     virtio-dev@...ts.oasis-open.org, linux-kernel@...r.kernel.org,
        virtualization@...ts.linux-foundation.org, kvm@...r.kernel.org,
        linux-mm@...ck.org, mhocko@...nel.org, akpm@...ux-foundation.org,
        torvalds@...ux-foundation.org, pbonzini@...hat.com,
        liliang.opensource@...il.com, yang.zhang.wz@...il.com,
        quan.xu0@...il.com, nilal@...hat.com, riel@...hat.com,
        peterx@...hat.com
Subject: Re: [PATCH v34 2/4] virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT

On Wed, Jun 27, 2018 at 11:00:05AM +0800, Wei Wang wrote:
> On 06/27/2018 10:41 AM, Michael S. Tsirkin wrote:
> > On Wed, Jun 27, 2018 at 09:24:18AM +0800, Wei Wang wrote:
> > > On 06/26/2018 09:34 PM, Michael S. Tsirkin wrote:
> > > > On Tue, Jun 26, 2018 at 08:27:44PM +0800, Wei Wang wrote:
> > > > > On 06/26/2018 11:56 AM, Michael S. Tsirkin wrote:
> > > > > > On Tue, Jun 26, 2018 at 11:46:35AM +0800, Wei Wang wrote:
> > > > > > 
> > > > > > > > > +	if (!arrays)
> > > > > > > > > +		return NULL;
> > > > > > > > > +
> > > > > > > > > +	for (i = 0; i < max_array_num; i++) {
> > > > > > > > So we are getting a ton of memory here just to free it up a bit later.
> > > > > > > > Why doesn't get_from_free_page_list get the pages from free list for us?
> > > > > > > > We could also avoid the 1st allocation then - just build a list
> > > > > > > > of these.
> > > > > > > That wouldn't be a good choice for us. If we check how the regular
> > > > > > > allocation works, there are many many things we need to consider when pages
> > > > > > > are allocated to users.
> > > > > > > For example, we need to take care of the nr_free
> > > > > > > counter, we need to check the watermark and perform the related actions.
> > > > > > > Also the folks working on arch_alloc_page to monitor page allocation
> > > > > > > activities would get a surprise..if page allocation is allowed to work in
> > > > > > > this way.
> > > > > > > 
> > > > > > mm/ code is well positioned to handle all this correctly.
> > > > > I'm afraid that would be a re-implementation of the alloc functions,
> > > > A re-factoring - you can share code. The main difference is locking.
> > > > 
> > > > > and
> > > > > that would be much more complex than what we have. I think your idea of
> > > > > passing a list of pages is better.
> > > > > 
> > > > > Best,
> > > > > Wei
> > > > How much memory is this allocating anyway?
> > > > 
> > > For every 2TB memory that the guest has, we allocate 4MB.
> > Hmm I guess I'm missing something, I don't see it:
> > 
> > 
> > +       max_entries = max_free_page_blocks(ARRAY_ALLOC_ORDER);
> > +       entries_per_page = PAGE_SIZE / sizeof(__le64);
> > +       entries_per_array = entries_per_page * (1 << ARRAY_ALLOC_ORDER);
> > +       max_array_num = max_entries / entries_per_array +
> > +                       !!(max_entries % entries_per_array);
> > 
> > Looks like you always allocate the max number?
> 
> Yes. We allocated the max number and then free what's not used.
> For example, a 16TB guest, we allocate Four 4MB buffers and pass the 4
> buffers to get_from_free_page_list. If it uses 3, then the remaining 1 "4MB
> buffer" will end up being freed.
> 
> For today's guests, max_array_num is usually 1.
> 
> Best,
> Wei

I see, it's based on total ram pages. It's reasonable but might
get out of sync if memory is onlined quickly. So you want to
detect that there's more free memory than can fit and
retry the reporting.

> 
> 
>