linux-kernel - Re: [RFC][Patch v8 6/7] KVM: Enables the kernel to isolate and report free pages

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <ab03c35e-b2bb-cb96-4701-436a4e2770d1@redhat.com>
Date:   Thu, 7 Feb 2019 15:50:04 -0500
From:   Nitesh Narayan Lal <nitesh@...hat.com>
To:     Alexander Duyck <alexander.duyck@...il.com>,
        "Michael S. Tsirkin" <mst@...hat.com>
Cc:     kvm list <kvm@...r.kernel.org>,
        LKML <linux-kernel@...r.kernel.org>,
        Paolo Bonzini <pbonzini@...hat.com>, lcapitulino@...hat.com,
        pagupta@...hat.com, wei.w.wang@...el.com,
        Yang Zhang <yang.zhang.wz@...il.com>, riel@...riel.com,
        david@...hat.com, dodgen@...gle.com,
        Konrad Rzeszutek Wilk <konrad.wilk@...cle.com>,
        dhildenb@...hat.com, Andrea Arcangeli <aarcange@...hat.com>
Subject: Re: [RFC][Patch v8 6/7] KVM: Enables the kernel to isolate and report
 free pages


On 2/7/19 12:43 PM, Alexander Duyck wrote:
> On Tue, Feb 5, 2019 at 3:21 PM Michael S. Tsirkin <mst@...hat.com> wrote:
>> On Tue, Feb 05, 2019 at 04:54:03PM -0500, Nitesh Narayan Lal wrote:
>>> On 2/5/19 3:45 PM, Michael S. Tsirkin wrote:
>>>> On Mon, Feb 04, 2019 at 03:18:53PM -0500, Nitesh Narayan Lal wrote:
>>>>> This patch enables the kernel to scan the per cpu array and
>>>>> compress it by removing the repetitive/re-allocated pages.
>>>>> Once the per cpu array is completely filled with pages in the
>>>>> buddy it wakes up the kernel per cpu thread which re-scans the
>>>>> entire per cpu array by acquiring a zone lock corresponding to
>>>>> the page which is being scanned. If the page is still free and
>>>>> present in the buddy it tries to isolate the page and adds it
>>>>> to another per cpu array.
>>>>>
>>>>> Once this scanning process is complete and if there are any
>>>>> isolated pages added to the new per cpu array kernel thread
>>>>> invokes hyperlist_ready().
>>>>>
>>>>> In hyperlist_ready() a hypercall is made to report these pages to
>>>>> the host using the virtio-balloon framework. In order to do so
>>>>> another virtqueue 'hinting_vq' is added to the balloon framework.
>>>>> As the host frees all the reported pages, the kernel thread returns
>>>>> them back to the buddy.
>>>>>
>>>>> Signed-off-by: Nitesh Narayan Lal <nitesh@...hat.com>
>>>> This looks kind of like what early iterations of Wei's patches did.
>>>>
>>>> But this has lots of issues, for example you might end up with
>>>> a hypercall per a 4K page.
>>>> So in the end, he switched over to just reporting only
>>>> MAX_ORDER - 1 pages.
>>> You mean that I should only capture/attempt to isolate pages with order
>>> MAX_ORDER - 1?
>>>> Would that be a good idea for you too?
>>> Will it help if we have a threshold value based on the amount of memory
>>> captured instead of the number of entries/pages in the array?
>> This is what Wei's patches do at least.
> So in the solution I had posted I was looking more at
> HUGETLB_PAGE_ORDER and above as the size of pages to provide the hints
> on [1]. The advantage to doing that is that you can also avoid
> fragmenting huge pages which in turn can cause what looks like a
> memory leak as the memory subsystem attempts to reassemble huge
> pages[2]. In my mind a 2MB page makes good sense in terms of the size
> of things to be performing hints on as anything smaller than that is
> going to just end up being a bunch of extra work and end up causing a
> bunch of fragmentation.
As per my opinion, in any implementation which page size to store before
reporting depends on the allocation pattern of the workload running in
the guest.

I am also planning to try Michael's suggestion of using MAX_ORDER - 1.
However I am still thinking about a workload which I can use to test its
effectiveness.

>
> The only issue with limiting things on an arbitrary boundary like that
> is that you have to hook into the buddy allocator to catch the cases
> where a page has been merged up into that range.
I don't think, I understood your comment completely. In any case, we
have to rely on the buddy for merging the pages.
>
> [1] https://lkml.org/lkml/2019/2/4/903
> [2] https://blog.digitalocean.com/transparent-huge-pages-and-alternative-memory-allocators/
-- 
Regards
Nitesh



Download attachment "signature.asc" of type "application/pgp-signature" (834 bytes)