linux-kernel - Re: [RFC for Linux] virtio_balloon: Add VIRTIO_BALLOON_F_THP

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <00dc8bad-05e5-6085-525c-ce9fded672cc@redhat.com>
Date:   Tue, 31 Mar 2020 16:34:48 +0200
From:   David Hildenbrand <david@...hat.com>
To:     "Michael S. Tsirkin" <mst@...hat.com>
Cc:     Hui Zhu <teawater@...il.com>, jasowang@...hat.com,
        akpm@...ux-foundation.org, pagupta@...hat.com,
        mojha@...eaurora.org, namit@...are.com,
        virtualization@...ts.linux-foundation.org,
        linux-kernel@...r.kernel.org, qemu-devel@...gnu.org,
        Hui Zhu <teawaterz@...ux.alibaba.com>,
        Alexander Duyck <alexander.h.duyck@...ux.intel.com>
Subject: Re: [RFC for Linux] virtio_balloon: Add VIRTIO_BALLOON_F_THP_ORDER to
 handle THP spilt issue

On 31.03.20 16:29, David Hildenbrand wrote:
> On 31.03.20 16:18, Michael S. Tsirkin wrote:
>> On Tue, Mar 31, 2020 at 04:09:59PM +0200, David Hildenbrand wrote:
>>
>> ...
>>
>>>>>>>>>>>>>> So if we want to address this, IMHO this calls for a new API.
>>>>>>>>>>>>>> Along the lines of
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>    struct page *alloc_page_range(gfp_t gfp, unsigned int min_order,
>>>>>>>>>>>>>>                    unsigned int max_order, unsigned int *order)
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> the idea would then be to return at a number of pages in the given
>>>>>>>>>>>>>> range.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> What do you think? Want to try implementing that?
>>
>> ..
>>
>>> I expect the whole "steal huge pages from your guest" to be problematic,
>>> as I already mentioned to Alex. This needs a performance evaluation.
>>>
>>> This all smells like a lot of workload dependent fine-tuning. :)
>>
>>
>> So that's why I proposed the API above.
>>
>> The idea is that *if we are allocating a huge page anyway*,
>> rather than break it up let's send it whole to the device.
>> If we have smaller pages, return smaller pages.
>>
> 
> Sorry, I still fail to see why you cannot do that with my version of
> balloon_pages_alloc(). But maybe I haven't understood the magic you
> expect to happen in alloc_page_range() :)
> 
> It's just going via a different inflate queue once we have that page, as
> I stated in front of my draft patch "but with an
> optimized reporting interface".
> 
>> That seems like it would always be an improvement, whatever the
>> workload.
>>
> 
> Don't think so. Assume there are plenty of 4k pages lying around. It
> might actually be *bad* for guest performance if you take a huge page
> instead of all the leftover 4k pages that cannot be merged. Only at the
> point where you would want to break a bigger page up and report it in
> pieces, where it would definitely make no difference.

I just understood what you mean :) and now it makes sense - it avoids
exactly that. Basically

1. Try to allocate order-0. No split necessary? return the page
2. Try to allocate order-1. No split necessary? return the page
...

up to MAX_ORDER - 1.

Yeah, I guess this will need a new kernel API.


-- 
Thanks,

David / dhildenb