[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <a31b66f2-1f4d-4826-bd57-2600603d5e0c@redhat.com>
Date: Wed, 16 Oct 2024 13:54:32 +0200
From: David Hildenbrand <david@...hat.com>
To: Vishal Annapurve <vannapurve@...gle.com>
Cc: Ackerley Tng <ackerleytng@...gle.com>, Peter Xu <peterx@...hat.com>,
tabba@...gle.com, quic_eberman@...cinc.com, roypat@...zon.co.uk,
jgg@...dia.com, rientjes@...gle.com, fvdl@...gle.com, jthoughton@...gle.com,
seanjc@...gle.com, pbonzini@...hat.com, zhiquan1.li@...el.com,
fan.du@...el.com, jun.miao@...el.com, isaku.yamahata@...el.com,
muchun.song@...ux.dev, erdemaktas@...gle.com, qperret@...gle.com,
jhubbard@...dia.com, willy@...radead.org, shuah@...nel.org,
brauner@...nel.org, bfoster@...hat.com, kent.overstreet@...ux.dev,
pvorel@...e.cz, rppt@...nel.org, richard.weiyang@...il.com,
anup@...infault.org, haibo1.xu@...el.com, ajones@...tanamicro.com,
vkuznets@...hat.com, maciej.wieczor-retman@...el.com, pgonda@...gle.com,
oliver.upton@...ux.dev, linux-kernel@...r.kernel.org, linux-mm@...ck.org,
kvm@...r.kernel.org, linux-kselftest@...r.kernel.org
Subject: Re: [RFC PATCH 26/39] KVM: guest_memfd: Track faultability within a
struct kvm_gmem_private
On 16.10.24 12:48, Vishal Annapurve wrote:
> On Wed, Oct 16, 2024 at 2:20 PM David Hildenbrand <david@...hat.com> wrote:
>>
>>>> I also don't know how you treat things like folio_test_hugetlb() on
>>>> possible assumptions that the VMA must be a hugetlb vma. I'd confess I
>>>> didn't yet check the rest of the patchset yet - reading a large series
>>>> without a git tree is sometimes challenging to me.
>>>>
>>>
>>> I'm thinking to basically never involve folio_test_hugetlb(), and the
>>> VMAs used by guest_memfd will also never be a HugeTLB VMA. That's
>>> because only the HugeTLB allocator is used, but by the time the folio is
>>> mapped to userspace, it would have already have been split. After the
>>> page is split, the folio loses its HugeTLB status. guest_memfd folios
>>> will never be mapped to userspace while they still have a HugeTLB
>>> status.
>>
>> We absolutely must convert these hugetlb folios to non-hugetlb folios.
>>
>> That is one of the reasons why I raised at LPC that we should focus on
>> leaving hugetlb out of the picture and rather have a global pool, and
>> the option to move folios from the global pool back and forth to hugetlb
>> or to guest_memfd.
>>
>> How exactly that would look like is TBD.
>>
>> For the time being, I think we could add a "hack" to take hugetlb folios
>> from hugetlb for our purposes, but we would absolutely have to convert
>> them to non-hugetlb folios, especially when we split them to small
>> folios and start using the mapcount. But it doesn't feel quite clean.
>
> As hugepage folios need to be split up in order to support backing
> CoCo VMs with hugepages, I would assume any folio based hugepage
> memory allocation will need to go through split/merge cycles through
> the guest memfd lifetime.
Yes, that's my understanding as well.
>
> Plan through next RFC series is to abstract out the hugetlb folio
> management within guest_memfd so that any hugetlb specific logic is
> cleanly separated out and allows guest memfd to allocate memory from
> other hugepage allocators in the future.
Yes, that must happen. As soon as a hugetlb folio would transition to
guest_memfd, it must no longer be a hugetlb folio.
>
>>
>> Simply starting with a separate global pool (e.g., boot-time allocation
>> similar to as done by hugetlb, or CMA) might be cleaner, and a lot of
>> stuff could be factored out from hugetlb code to achieve that.
>
> I am not sure if a separate global pool necessarily solves all the
> issues here unless we come up with more concrete implementation
> details. One of the concerns was the ability of implementing/retaining
> HVO while transferring memory between the separate global pool and
> hugetlb pool i.e. whether it can seamlessly serve all hugepage users
> on the host.
Likely should be doable. All we need is the generalized concept of a
folio with HVO, and a way to move these folios between owners (e.g.,
global<->hugetlb, global<->guest_memfd).
Factoring the HVO optimization out shouldn't be too crazy I believe.
Famous last words :)
> Another question could be whether the separate
> pool/allocator simplifies the split/merge operations at runtime.
The less hugetlb hacks we have to add, the better :)
--
Cheers,
David / dhildenb
Powered by blists - more mailing lists