[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <d15bfdc8-e309-4041-b4c7-e8c3cdf78b26@intel.com>
Date: Thu, 19 Jun 2025 16:59:14 +0800
From: Xiaoyao Li <xiaoyao.li@...el.com>
To: Yan Zhao <yan.y.zhao@...el.com>, Ackerley Tng <ackerleytng@...gle.com>
Cc: kvm@...r.kernel.org, linux-mm@...ck.org, linux-kernel@...r.kernel.org,
x86@...nel.org, linux-fsdevel@...r.kernel.org, aik@....com,
ajones@...tanamicro.com, akpm@...ux-foundation.org, amoorthy@...gle.com,
anthony.yznaga@...cle.com, anup@...infault.org, aou@...s.berkeley.edu,
bfoster@...hat.com, binbin.wu@...ux.intel.com, brauner@...nel.org,
catalin.marinas@....com, chao.p.peng@...el.com, chenhuacai@...nel.org,
dave.hansen@...el.com, david@...hat.com, dmatlack@...gle.com,
dwmw@...zon.co.uk, erdemaktas@...gle.com, fan.du@...el.com, fvdl@...gle.com,
graf@...zon.com, haibo1.xu@...el.com, hch@...radead.org, hughd@...gle.com,
ira.weiny@...el.com, isaku.yamahata@...el.com, jack@...e.cz,
james.morse@....com, jarkko@...nel.org, jgg@...pe.ca, jgowans@...zon.com,
jhubbard@...dia.com, jroedel@...e.de, jthoughton@...gle.com,
jun.miao@...el.com, kai.huang@...el.com, keirf@...gle.com,
kent.overstreet@...ux.dev, kirill.shutemov@...el.com,
liam.merwick@...cle.com, maciej.wieczor-retman@...el.com,
mail@...iej.szmigiero.name, maz@...nel.org, mic@...ikod.net,
michael.roth@....com, mpe@...erman.id.au, muchun.song@...ux.dev,
nikunj@....com, nsaenz@...zon.es, oliver.upton@...ux.dev,
palmer@...belt.com, pankaj.gupta@....com, paul.walmsley@...ive.com,
pbonzini@...hat.com, pdurrant@...zon.co.uk, peterx@...hat.com,
pgonda@...gle.com, pvorel@...e.cz, qperret@...gle.com,
quic_cvanscha@...cinc.com, quic_eberman@...cinc.com,
quic_mnalajal@...cinc.com, quic_pderrin@...cinc.com,
quic_pheragu@...cinc.com, quic_svaddagi@...cinc.com, quic_tsoni@...cinc.com,
richard.weiyang@...il.com, rick.p.edgecombe@...el.com, rientjes@...gle.com,
roypat@...zon.co.uk, rppt@...nel.org, seanjc@...gle.com, shuah@...nel.org,
steven.price@....com, steven.sistare@...cle.com, suzuki.poulose@....com,
tabba@...gle.com, thomas.lendacky@....com, usama.arif@...edance.com,
vannapurve@...gle.com, vbabka@...e.cz, viro@...iv.linux.org.uk,
vkuznets@...hat.com, wei.w.wang@...el.com, will@...nel.org,
willy@...radead.org, yilun.xu@...el.com, yuzenghui@...wei.com,
zhiquan1.li@...el.com
Subject: Re: [RFC PATCH v2 00/51] 1G page support for guest_memfd
On 6/19/2025 4:13 PM, Yan Zhao wrote:
> On Wed, May 14, 2025 at 04:41:39PM -0700, Ackerley Tng wrote:
>> Hello,
>>
>> This patchset builds upon discussion at LPC 2024 and many guest_memfd
>> upstream calls to provide 1G page support for guest_memfd by taking
>> pages from HugeTLB.
>>
>> This patchset is based on Linux v6.15-rc6, and requires the mmap support
>> for guest_memfd patchset (Thanks Fuad!) [1].
>>
>> For ease of testing, this series is also available, stitched together,
>> at https://github.com/googleprodkernel/linux-cc/tree/gmem-1g-page-support-rfc-v2
>
> Just to record a found issue -- not one that must be fixed.
>
> In TDX, the initial memory region is added as private memory during TD's build
> time, with its initial content copied from source pages in shared memory.
> The copy operation requires simultaneous access to both shared source memory
> and private target memory.
>
> Therefore, userspace cannot store the initial content in shared memory at the
> mmap-ed VA of a guest_memfd that performs in-place conversion between shared and
> private memory. This is because the guest_memfd will first unmap a PFN in shared
> page tables and then check for any extra refcount held for the shared PFN before
> converting it to private.
I have an idea.
If I understand correctly, the KVM_GMEM_CONVERT_PRIVATE of in-place
conversion unmap the PFN in shared page tables while keeping the content
of the page unchanged, right?
So KVM_GMEM_CONVERT_PRIVATE can be used to initialize the private memory
actually for non-CoCo case actually, that userspace first mmap() it and
ensure it's shared and writes the initial content to it, after it
userspace convert it to private with KVM_GMEM_CONVERT_PRIVATE.
For CoCo case, like TDX, it can hook to KVM_GMEM_CONVERT_PRIVATE if it
wants the private memory to be initialized with initial content, and
just do in-place TDH.PAGE.ADD in the hook.
> Currently, we tested the initial memory region using the in-place conversion
> version of guest_memfd as backend by modifying QEMU to add an extra anonymous
> backend to hold the source initial content in shared memory. The extra anonymous
> backend is freed after finishing ading the initial memory region.
>
> This issue is benign for TDX, as the initial memory region can also utilize the
> traditional guest_memfd, which only allows 4KB mappings. This is acceptable for
> now, as the initial memory region typically involves a small amount of memory,
> and we may not enable huge pages for ranges covered by the initial memory region
> in the near future.
Powered by blists - more mailing lists