linux-kernel - Re: [RFC PATCH 26/39] KVM: guest_memfd: Track faultability within a struct kvm_gmem

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <a31b66f2-1f4d-4826-bd57-2600603d5e0c@redhat.com>
Date: Wed, 16 Oct 2024 13:54:32 +0200
From: David Hildenbrand <david@...hat.com>
To: Vishal Annapurve <vannapurve@...gle.com>
Cc: Ackerley Tng <ackerleytng@...gle.com>, Peter Xu <peterx@...hat.com>,
 tabba@...gle.com, quic_eberman@...cinc.com, roypat@...zon.co.uk,
 jgg@...dia.com, rientjes@...gle.com, fvdl@...gle.com, jthoughton@...gle.com,
 seanjc@...gle.com, pbonzini@...hat.com, zhiquan1.li@...el.com,
 fan.du@...el.com, jun.miao@...el.com, isaku.yamahata@...el.com,
 muchun.song@...ux.dev, erdemaktas@...gle.com, qperret@...gle.com,
 jhubbard@...dia.com, willy@...radead.org, shuah@...nel.org,
 brauner@...nel.org, bfoster@...hat.com, kent.overstreet@...ux.dev,
 pvorel@...e.cz, rppt@...nel.org, richard.weiyang@...il.com,
 anup@...infault.org, haibo1.xu@...el.com, ajones@...tanamicro.com,
 vkuznets@...hat.com, maciej.wieczor-retman@...el.com, pgonda@...gle.com,
 oliver.upton@...ux.dev, linux-kernel@...r.kernel.org, linux-mm@...ck.org,
 kvm@...r.kernel.org, linux-kselftest@...r.kernel.org
Subject: Re: [RFC PATCH 26/39] KVM: guest_memfd: Track faultability within a
 struct kvm_gmem_private

On 16.10.24 12:48, Vishal Annapurve wrote:
> On Wed, Oct 16, 2024 at 2:20 PM David Hildenbrand <david@...hat.com> wrote:
>>
>>>> I also don't know how you treat things like folio_test_hugetlb() on
>>>> possible assumptions that the VMA must be a hugetlb vma.  I'd confess I
>>>> didn't yet check the rest of the patchset yet - reading a large series
>>>> without a git tree is sometimes challenging to me.
>>>>
>>>
>>> I'm thinking to basically never involve folio_test_hugetlb(), and the
>>> VMAs used by guest_memfd will also never be a HugeTLB VMA. That's
>>> because only the HugeTLB allocator is used, but by the time the folio is
>>> mapped to userspace, it would have already have been split. After the
>>> page is split, the folio loses its HugeTLB status. guest_memfd folios
>>> will never be mapped to userspace while they still have a HugeTLB
>>> status.
>>
>> We absolutely must convert these hugetlb folios to non-hugetlb folios.
>>
>> That is one of the reasons why I raised at LPC that we should focus on
>> leaving hugetlb out of the picture and rather have a global pool, and
>> the option to move folios from the global pool back and forth to hugetlb
>> or to guest_memfd.
>>
>> How exactly that would look like is TBD.
>>
>> For the time being, I think we could add a "hack" to take hugetlb folios
>> from hugetlb for our purposes, but we would absolutely have to convert
>> them to non-hugetlb folios, especially when we split them to small
>> folios and start using the mapcount. But it doesn't feel quite clean.
> 
> As hugepage folios need to be split up in order to support backing
> CoCo VMs with hugepages, I would assume any folio based hugepage
> memory allocation will need to go through split/merge cycles through
> the guest memfd lifetime.

Yes, that's my understanding as well.

> 
> Plan through next RFC series is to abstract out the hugetlb folio
> management within guest_memfd so that any hugetlb specific logic is
> cleanly separated out and allows guest memfd to allocate memory from
> other hugepage allocators in the future.

Yes, that must happen. As soon as a hugetlb folio would transition to 
guest_memfd, it must no longer be a hugetlb folio.

> 
>>
>> Simply starting with a separate global pool (e.g., boot-time allocation
>> similar to as done by hugetlb, or CMA) might be cleaner, and a lot of
>> stuff could be factored out from hugetlb code to achieve that.
> 
> I am not sure if a separate global pool necessarily solves all the
> issues here unless we come up with more concrete implementation
> details. One of the concerns was the ability of implementing/retaining
> HVO while transferring memory between the separate global pool and
> hugetlb pool i.e. whether it can seamlessly serve all hugepage users
> on the host.

Likely should be doable. All we need is the generalized concept of a 
folio with HVO, and a way to move these folios between owners (e.g., 
global<->hugetlb, global<->guest_memfd).

Factoring the HVO optimization out shouldn't be too crazy I believe. 
Famous last words :)

> Another question could be whether the separate
> pool/allocator simplifies the split/merge operations at runtime.

The less hugetlb hacks we have to add, the better :)

-- 
Cheers,

David / dhildenb