[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <aFPGPVbzo92t565h@yzhao56-desk.sh.intel.com>
Date: Thu, 19 Jun 2025 16:11:41 +0800
From: Yan Zhao <yan.y.zhao@...el.com>
To: Ackerley Tng <ackerleytng@...gle.com>
CC: <vannapurve@...gle.com>, <pbonzini@...hat.com>, <seanjc@...gle.com>,
<linux-kernel@...r.kernel.org>, <kvm@...r.kernel.org>, <x86@...nel.org>,
<rick.p.edgecombe@...el.com>, <dave.hansen@...el.com>,
<kirill.shutemov@...el.com>, <tabba@...gle.com>, <quic_eberman@...cinc.com>,
<michael.roth@....com>, <david@...hat.com>, <vbabka@...e.cz>,
<jroedel@...e.de>, <thomas.lendacky@....com>, <pgonda@...gle.com>,
<zhiquan1.li@...el.com>, <fan.du@...el.com>, <jun.miao@...el.com>,
<ira.weiny@...el.com>, <isaku.yamahata@...el.com>, <xiaoyao.li@...el.com>,
<binbin.wu@...ux.intel.com>, <chao.p.peng@...el.com>
Subject: Re: [RFC PATCH 08/21] KVM: TDX: Increase/decrease folio ref for huge
pages
On Thu, Jun 05, 2025 at 03:35:50PM -0700, Ackerley Tng wrote:
> Yan Zhao <yan.y.zhao@...el.com> writes:
>
> > On Wed, Jun 04, 2025 at 01:02:54PM -0700, Ackerley Tng wrote:
> >> Hi Yan,
> >>
> >> While working on the 1G (aka HugeTLB) page support for guest_memfd
> >> series [1], we took into account conversion failures too. The steps are
> >> in kvm_gmem_convert_range(). (It might be easier to pull the entire
> >> series from GitHub [2] because the steps for conversion changed in two
> >> separate patches.)
> > ...
> >> [2] https://github.com/googleprodkernel/linux-cc/tree/gmem-1g-page-support-rfc-v2
> >
> > Hi Ackerley,
> > Thanks for providing this branch.
>
> Here's the WIP branch [1], which I initially wasn't intending to make
> super public since it's not even RFC standard yet and I didn't want to
> add to the many guest_memfd in-flight series, but since you referred to
> it, [2] is a v2 of the WIP branch :)
>
> [1] https://github.com/googleprodkernel/linux-cc/commits/wip-tdx-gmem-conversions-hugetlb-2mept
> [2] https://github.com/googleprodkernel/linux-cc/commits/wip-tdx-gmem-conversions-hugetlb-2mept-v2
Thanks. [2] works. TDX huge pages now has successfully been rebased on top of [2].
> This WIP branch has selftests that test 1G aka HugeTLB page support with
> TDX huge page EPT mappings [7]:
>
> 1. "KVM: selftests: TDX: Test conversion to private at different
> sizes". This uses the fact that TDX module will return error if the
> page is faulted into the guest at a different level from the accept
> level to check the level that the page was faulted in.
> 2. "KVM: selftests: Test TDs in private_mem_conversions_test". Updates
> private_mem_conversions_test for use with TDs. This test does
> multi-vCPU conversions and we use this to check for issues to do with
> conversion races.
> 3. "KVM: selftests: TDX: Test conversions when guest_memfd used for
> private and shared memory". Adds a selftest similar to/on top of
> guest_memfd_conversions_test that does conversions via MapGPA.
>
> Full list of selftests I usually run from tools/testing/selftests/kvm:
> + ./guest_memfd_test
> + ./guest_memfd_conversions_test
> + ./guest_memfd_provide_hugetlb_cgroup_mount.sh ./guest_memfd_wrap_test_check_hugetlb_reporting.sh ./guest_memfd_test
> + ./guest_memfd_provide_hugetlb_cgroup_mount.sh ./guest_memfd_wrap_test_check_hugetlb_reporting.sh ./guest_memfd_conversions_test
> + ./guest_memfd_provide_hugetlb_cgroup_mount.sh ./guest_memfd_wrap_test_check_hugetlb_reporting.sh ./guest_memfd_hugetlb_reporting_test
> + ./x86/private_mem_conversions_test.sh
> + ./set_memory_region_test
> + ./x86/private_mem_kvm_exits_test
> + ./x86/tdx_vm_test
> + ./x86/tdx_upm_test
> + ./x86/tdx_shared_mem_test
> + ./x86/tdx_gmem_private_and_shared_test
>
> As an overview for anyone who might be interested in this WIP branch:
>
> 1. I started with upstream's kvm/next
> 2. Applied TDX selftests series [3]
> 3. Applied guest_memfd mmap series [4]
> 4. Applied conversions (sub)series and HugeTLB (sub)series [5]
> 5. Added some fixes for 2 of the earlier series (as labeled in commit
> message)
> 6. Updated guest_memfd conversions selftests to work with TDX
> 7. Applied 2M EPT series [6] with some hacks
> 8. Some patches to make guest_memfd mmap return huge-page-aligned
> userspace address
> 9. Selftests for guest_memfd conversion with TDX 2M EPT
>
> [3] https://lore.kernel.org/all/20250414214801.2693294-1-sagis@google.com/
> [4] https://lore.kernel.org/all/20250513163438.3942405-11-tabba@google.com/T/
> [5] https://lore.kernel.org/all/cover.1747264138.git.ackerleytng@google.com/T/
> [6] https://lore.kernel.org/all/Z%2FOMB7HNO%2FRQyljz@yzhao56-desk.sh.intel.com/
> [7] https://lore.kernel.org/all/20250424030033.32635-1-yan.y.zhao@intel.com/
Thanks.
We noticed that it's not easy for TDX initial memory regions to use in-place
conversion version of guest_memfd, because
- tdh_mem_page_add() requires simultaneous access to shared source memory and
private target memory.
- shared-to-private in-place conversion first unmaps the shared memory and tests
if any extra folio refcount is held before the conversion is allowed.
Therefore, though tdh_mem_page_add() actually supports in-place add, see [8],
we can't store the initial content in the mmap-ed VA of the in-place conversion
version of guest_memfd.
So, I modified QEMU to workaround this issue by adding an extra anonymous
backend to hold source pages in shared memory, with the target private PFN
allocated from guest_memfd with GUEST_MEMFD_FLAG_SUPPORT_SHARED set.
The goal is to test whether kvm_gmem_populate() works for TDX huge pages.
This testing exposed a bug in kvm_gmem_populate(), which has been fixed in the
following patch.
commit 5f33ed7ca26f00a61c611d2d1fbc001a7ecd8dca
Author: Yan Zhao <yan.y.zhao@...el.com>
Date: Mon Jun 9 03:01:21 2025 -0700
Bug fix: Reduce max_order when GFN is not aligned
Fix the warning hit in kvm_gmem_populate().
"WARNING: CPU: 7 PID: 4421 at arch/x86/kvm/../../../virt/kvm/guest_memfd.c:
2496 kvm_gmem_populate+0x4a4/0x5b0"
The GFN passed to kvm_gmem_populate() may have an offset so it may not be
aligned to folio order. In this case, reduce the max_order to decrease the
mapping level.
Signed-off-by: Yan Zhao <yan.y.zhao@...el.com>
diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c
index 4b8047020f17..af7943c0a8ba 100644
--- a/virt/kvm/guest_memfd.c
+++ b/virt/kvm/guest_memfd.c
@@ -2493,7 +2493,8 @@ long kvm_gmem_populate(struct kvm *kvm, gfn_t start_gfn, void __user *src, long
}
folio_unlock(folio);
- WARN_ON(!IS_ALIGNED(gfn, 1 << max_order));
+ while (!IS_ALIGNED(gfn, 1 << max_order))
+ max_order--;
npages_to_populate = min(npages - i, 1 << max_order);
npages_to_populate = private_npages_to_populate(
[8] https://cdrdv2-public.intel.com/839195/intel-tdx-module-1.5-abi-spec-348551002.pdf
"In-Place Add: It is allowed to set the TD page HPA in R8 to the same address as
the source page HPA in R9. In this case the source page is converted to be a TD
private page".
Powered by blists - more mailing lists