[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <diqzh62ezgdh.fsf@ackerleytng-ctop.c.googlers.com>
Date: Wed, 23 Apr 2025 15:02:02 -0700
From: Ackerley Tng <ackerleytng@...gle.com>
To: Yan Zhao <yan.y.zhao@...el.com>
Cc: tabba@...gle.com, quic_eberman@...cinc.com, roypat@...zon.co.uk,
jgg@...dia.com, peterx@...hat.com, david@...hat.com, rientjes@...gle.com,
fvdl@...gle.com, jthoughton@...gle.com, seanjc@...gle.com,
pbonzini@...hat.com, zhiquan1.li@...el.com, fan.du@...el.com,
jun.miao@...el.com, isaku.yamahata@...el.com, muchun.song@...ux.dev,
erdemaktas@...gle.com, vannapurve@...gle.com, qperret@...gle.com,
jhubbard@...dia.com, willy@...radead.org, shuah@...nel.org,
brauner@...nel.org, bfoster@...hat.com, kent.overstreet@...ux.dev,
pvorel@...e.cz, rppt@...nel.org, richard.weiyang@...il.com,
anup@...infault.org, haibo1.xu@...el.com, ajones@...tanamicro.com,
vkuznets@...hat.com, maciej.wieczor-retman@...el.com, pgonda@...gle.com,
oliver.upton@...ux.dev, linux-kernel@...r.kernel.org, linux-mm@...ck.org,
kvm@...r.kernel.org, linux-kselftest@...r.kernel.org
Subject: Re: [RFC PATCH 39/39] KVM: guest_memfd: Dynamically split/reconstruct
HugeTLB page
Yan Zhao <yan.y.zhao@...el.com> writes:
> On Tue, Sep 10, 2024 at 11:44:10PM +0000, Ackerley Tng wrote:
>> +/*
>> + * Allocates and then caches a folio in the filemap. Returns a folio with
>> + * refcount of 2: 1 after allocation, and 1 taken by the filemap.
>> + */
>> +static struct folio *kvm_gmem_hugetlb_alloc_and_cache_folio(struct inode *inode,
>> + pgoff_t index)
>> +{
>> + struct kvm_gmem_hugetlb *hgmem;
>> + pgoff_t aligned_index;
>> + struct folio *folio;
>> + int nr_pages;
>> + int ret;
>> +
>> + hgmem = kvm_gmem_hgmem(inode);
>> + folio = kvm_gmem_hugetlb_alloc_folio(hgmem->h, hgmem->spool);
>> + if (IS_ERR(folio))
>> + return folio;
>> +
>> + nr_pages = 1UL << huge_page_order(hgmem->h);
>> + aligned_index = round_down(index, nr_pages);
> Maybe a gap here.
>
> When a guest_memfd is bound to a slot where slot->base_gfn is not aligned to
> 2M/1G and slot->gmem.pgoff is 0, even if an index is 2M/1G aligned, the
> corresponding GFN is not 2M/1G aligned.
Thanks for looking into this.
In 1G page support for guest_memfd, the offset and size are always
hugepage aligned to the hugepage size requested at guest_memfd creation
time, and it is true that when binding to a memslot, slot->base_gfn and
slot->npages may not be hugepage aligned.
>
> However, TDX requires that private huge pages be 2M aligned in GFN.
>
IIUC other factors also contribute to determining the mapping level in
the guest page tables, like lpage_info and .private_max_mapping_level()
in kvm_x86_ops.
If slot->base_gfn and slot->npages are not hugepage aligned, lpage_info
will track that and not allow faulting into guest page tables at higher
granularity.
Hence I think it is okay to leave it to KVM to fault pages into the
guest correctly. For guest_memfd will just maintain the invariant that
offset and size are hugepage aligned, but not require that
slot->base_gfn and slot->npages are hugepage aligned. This behavior will
be consistent with other backing memory for guests like regular shmem or
HugeTLB.
>> + ret = kvm_gmem_hugetlb_filemap_add_folio(inode->i_mapping, folio,
>> + aligned_index,
>> + htlb_alloc_mask(hgmem->h));
>> + WARN_ON(ret);
>> +
>> spin_lock(&inode->i_lock);
>> inode->i_blocks += blocks_per_huge_page(hgmem->h);
>> spin_unlock(&inode->i_lock);
>>
>> - return page_folio(requested_page);
>> + return folio;
>> +}
Powered by blists - more mailing lists