linux-kernel - Re: [RFC][Patch v8 0/7] KVM: Guest Free Page Hinting

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <0d7ff493-71f3-8707-8400-b51f1ce1a2bd@redhat.com>
Date:   Tue, 19 Feb 2019 09:17:43 -0500
From:   Nitesh Narayan Lal <nitesh@...hat.com>
To:     David Hildenbrand <david@...hat.com>
Cc:     kvm@...r.kernel.org, linux-kernel@...r.kernel.org,
        pbonzini@...hat.com, lcapitulino@...hat.com, pagupta@...hat.com,
        wei.w.wang@...el.com, yang.zhang.wz@...il.com, riel@...riel.com,
        dodgen@...gle.com, konrad.wilk@...cle.com, dhildenb@...hat.com,
        aarcange@...hat.com, Alexander Duyck <alexander.duyck@...il.com>,
        "Michael S. Tsirkin" <mst@...hat.com>
Subject: Re: [RFC][Patch v8 0/7] KVM: Guest Free Page Hinting

On 2/19/19 8:03 AM, David Hildenbrand wrote:
>>>>> There are two main ways to avoid allocation:
>>>>> 1. do not add extra data on top of each chunk passed
>>>> If I am not wrong then this is close to what we have right now.
>>> Yes, minus the kthread(s) and eventually with some sort of memory
>>> allocation for the request. Once you're asynchronous via a notification
>>> mechanisnm, there is no real need for a thread anymore, hopefully.
>> Whether we should go with kthread or without it, I would like to do some
>> performance comparison before commenting on this.
>>>> One issue I see right now is that I am polling while host is freeing the
>>>> memory.
>>>> In the next version I could tie the logic which returns pages to the
>>>> buddy and resets the per cpu array index value to 0 with the callback.
>>>> (i.e.., it happens once we receive an response from the host)
>>> The question is, what happens when freeing pages and the array is not
>>> ready to be reused yet. In that case, you want to somehow continue
>>> freeing pages without busy waiting or eventually not reporting pages.
>> This is what happens right now.
>> Having kthread or not should not effect this behavior.
>> When the array is full the current approach simply skips collecting the
>> free pages.
> Well, it somehow does affect your implementation. If you have a kthread
> you always have to synchronize against the VCPU: "Is the pcpu array
> ready to be used again".
>
> Once you do it asynchronously from your VCPU without another thread
> being involved, such synchronization is not required. Simply prepare a
> request and send it off. Reuse the pcpu array instantly. At least that's
> the theory :)
>
> If you have a guest bulk freeing a lot of memory, I guess temporarily
> dropping free page hints could be counter-productive. It really depends
> on how fast the thread gets scheduled and how long the hinting process
> takes. Having another thread involved might add a lot to that latency to
> that formula. We'll have to measure, but my gut feeling is that once we
> do stuff asynchronously, there is no need for a thread anymore.
This is true.
>
>>> The callback should put the pages back to the buddy and free the request
>>> eventually to have a fully asynchronous mechanism.
>>>
>>>> Other change which I am testing right now is to only capture 'MAX_ORDER
>>> I am not sure if this is an arbitrary number we came up with here. We
>>> should really play with different orders to find a hot spot. I wouldn't
>>> consider this high priority, though. Getting the whole concept right to
>>> be able to deal with any magic number we come up should be the ultimate
>>> goal. (stuff that only works with huge pages I consider not future
>>> proof, especially regarding fragmented guests which can happen easily)
>> Its quite possible that when we are only capturing MAX_ORDER - 1 and run
>> a specific workload we don't get any memory back until we re-run the
>> program and buddy finally starts merging of pages of order MAX_ORDER -1.
>> This is why I think we may make this configurable from compile time and
>> keep capturing MAX_ORDER - 1 so that we don't end up breaking anything.
> Eventually pages will never get merged. Assume you have 1 page of a
> MAX_ORDER - 1 chunk still allocated somewhere (e.g. !movable via
> kmalloc). You skip reporting that chunk completely. Roughly 1mb/2mb/4mb
> wasted (depending on the arch). This stuff can sum up.

After the discussion, here are the changes on which I am planning to
work next:
1. Get rid of the kthread and dynamically allocate a per-cpu array to
hold the
isolated pages. As soon as the initial per-cpu array is completely
scanned, release it
so that we don't end up blocking anything.
2. Continue capturing MAX_ORDER - 1, for now. Reduce the initial per-cpu
array size to 256
for now. As we are doing asynchronous reporting we should be fine with a
lower size array.
3. As soon as the host responds, release the pages back to the buddy
from the callback and free the request.

Benefits wrt current implementation:
1. We will not eat up performance due to kernel thread.
2. We will still be doing reporting asynchronously=> no blocking.
3. Hopefully, we will be able to free more memory.
-- 
Regards
Nitesh



Download attachment "signature.asc" of type "application/pgp-signature" (834 bytes)