linux-kernel - Re: [RFC PATCH v2 00/51] 1G page support for guest

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <d15bfdc8-e309-4041-b4c7-e8c3cdf78b26@intel.com>
Date: Thu, 19 Jun 2025 16:59:14 +0800
From: Xiaoyao Li <xiaoyao.li@...el.com>
To: Yan Zhao <yan.y.zhao@...el.com>, Ackerley Tng <ackerleytng@...gle.com>
Cc: kvm@...r.kernel.org, linux-mm@...ck.org, linux-kernel@...r.kernel.org,
 x86@...nel.org, linux-fsdevel@...r.kernel.org, aik@....com,
 ajones@...tanamicro.com, akpm@...ux-foundation.org, amoorthy@...gle.com,
 anthony.yznaga@...cle.com, anup@...infault.org, aou@...s.berkeley.edu,
 bfoster@...hat.com, binbin.wu@...ux.intel.com, brauner@...nel.org,
 catalin.marinas@....com, chao.p.peng@...el.com, chenhuacai@...nel.org,
 dave.hansen@...el.com, david@...hat.com, dmatlack@...gle.com,
 dwmw@...zon.co.uk, erdemaktas@...gle.com, fan.du@...el.com, fvdl@...gle.com,
 graf@...zon.com, haibo1.xu@...el.com, hch@...radead.org, hughd@...gle.com,
 ira.weiny@...el.com, isaku.yamahata@...el.com, jack@...e.cz,
 james.morse@....com, jarkko@...nel.org, jgg@...pe.ca, jgowans@...zon.com,
 jhubbard@...dia.com, jroedel@...e.de, jthoughton@...gle.com,
 jun.miao@...el.com, kai.huang@...el.com, keirf@...gle.com,
 kent.overstreet@...ux.dev, kirill.shutemov@...el.com,
 liam.merwick@...cle.com, maciej.wieczor-retman@...el.com,
 mail@...iej.szmigiero.name, maz@...nel.org, mic@...ikod.net,
 michael.roth@....com, mpe@...erman.id.au, muchun.song@...ux.dev,
 nikunj@....com, nsaenz@...zon.es, oliver.upton@...ux.dev,
 palmer@...belt.com, pankaj.gupta@....com, paul.walmsley@...ive.com,
 pbonzini@...hat.com, pdurrant@...zon.co.uk, peterx@...hat.com,
 pgonda@...gle.com, pvorel@...e.cz, qperret@...gle.com,
 quic_cvanscha@...cinc.com, quic_eberman@...cinc.com,
 quic_mnalajal@...cinc.com, quic_pderrin@...cinc.com,
 quic_pheragu@...cinc.com, quic_svaddagi@...cinc.com, quic_tsoni@...cinc.com,
 richard.weiyang@...il.com, rick.p.edgecombe@...el.com, rientjes@...gle.com,
 roypat@...zon.co.uk, rppt@...nel.org, seanjc@...gle.com, shuah@...nel.org,
 steven.price@....com, steven.sistare@...cle.com, suzuki.poulose@....com,
 tabba@...gle.com, thomas.lendacky@....com, usama.arif@...edance.com,
 vannapurve@...gle.com, vbabka@...e.cz, viro@...iv.linux.org.uk,
 vkuznets@...hat.com, wei.w.wang@...el.com, will@...nel.org,
 willy@...radead.org, yilun.xu@...el.com, yuzenghui@...wei.com,
 zhiquan1.li@...el.com
Subject: Re: [RFC PATCH v2 00/51] 1G page support for guest_memfd

On 6/19/2025 4:13 PM, Yan Zhao wrote:
> On Wed, May 14, 2025 at 04:41:39PM -0700, Ackerley Tng wrote:
>> Hello,
>>
>> This patchset builds upon discussion at LPC 2024 and many guest_memfd
>> upstream calls to provide 1G page support for guest_memfd by taking
>> pages from HugeTLB.
>>
>> This patchset is based on Linux v6.15-rc6, and requires the mmap support
>> for guest_memfd patchset (Thanks Fuad!) [1].
>>
>> For ease of testing, this series is also available, stitched together,
>> at https://github.com/googleprodkernel/linux-cc/tree/gmem-1g-page-support-rfc-v2
>   
> Just to record a found issue -- not one that must be fixed.
> 
> In TDX, the initial memory region is added as private memory during TD's build
> time, with its initial content copied from source pages in shared memory.
> The copy operation requires simultaneous access to both shared source memory
> and private target memory.
> 
> Therefore, userspace cannot store the initial content in shared memory at the
> mmap-ed VA of a guest_memfd that performs in-place conversion between shared and
> private memory. This is because the guest_memfd will first unmap a PFN in shared
> page tables and then check for any extra refcount held for the shared PFN before
> converting it to private.

I have an idea.

If I understand correctly, the KVM_GMEM_CONVERT_PRIVATE of in-place 
conversion unmap the PFN in shared page tables while keeping the content 
of the page unchanged, right?

So KVM_GMEM_CONVERT_PRIVATE can be used to initialize the private memory 
actually for non-CoCo case actually, that userspace first mmap() it and 
ensure it's shared and writes the initial content to it, after it 
userspace convert it to private with KVM_GMEM_CONVERT_PRIVATE.

For CoCo case, like TDX, it can hook to KVM_GMEM_CONVERT_PRIVATE if it 
wants the private memory to be initialized with initial content, and 
just do in-place TDH.PAGE.ADD in the hook.

> Currently, we tested the initial memory region using the in-place conversion
> version of guest_memfd as backend by modifying QEMU to add an extra anonymous
> backend to hold the source initial content in shared memory. The extra anonymous
> backend is freed after finishing ading the initial memory region.
> 
> This issue is benign for TDX, as the initial memory region can also utilize the
> traditional guest_memfd, which only allows 4KB mappings. This is acceptable for
> now, as the initial memory region typically involves a small amount of memory,
> and we may not enable huge pages for ranges covered by the initial memory region
> in the near future.