linux-kernel - Re: [PATCH v3 00/24] KVM: TDX huge page support for private memory

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <aWbkcRshLiL4NWZg@yzhao56-desk.sh.intel.com>
Date: Wed, 14 Jan 2026 08:33:53 +0800
From: Yan Zhao <yan.y.zhao@...el.com>
To: Ackerley Tng <ackerleytng@...gle.com>
CC: Sean Christopherson <seanjc@...gle.com>, Vishal Annapurve
	<vannapurve@...gle.com>, <pbonzini@...hat.com>,
	<linux-kernel@...r.kernel.org>, <kvm@...r.kernel.org>, <x86@...nel.org>,
	<rick.p.edgecombe@...el.com>, <dave.hansen@...el.com>, <kas@...nel.org>,
	<tabba@...gle.com>, <michael.roth@....com>, <david@...nel.org>,
	<sagis@...gle.com>, <vbabka@...e.cz>, <thomas.lendacky@....com>,
	<nik.borisov@...e.com>, <pgonda@...gle.com>, <fan.du@...el.com>,
	<jun.miao@...el.com>, <francescolavra.fl@...il.com>, <jgross@...e.com>,
	<ira.weiny@...el.com>, <isaku.yamahata@...el.com>, <xiaoyao.li@...el.com>,
	<kai.huang@...el.com>, <binbin.wu@...ux.intel.com>, <chao.p.peng@...el.com>,
	<chao.gao@...el.com>
Subject: Re: [PATCH v3 00/24] KVM: TDX huge page support for private memory

On Mon, Jan 12, 2026 at 12:15:17PM -0800, Ackerley Tng wrote:
> Sean Christopherson <seanjc@...gle.com> writes:
> 
> > Mapping a hugepage for memory that KVM _knows_ is contiguous and homogenous is
> > conceptually totally fine, i.e. I'm not totally opposed to adding support for
> > mapping multiple guest_memfd folios with a single hugepage. As to whether we
> 
> Sean, I'd like to clarify this.
> 
> > do (a) nothing,
> 
> What does do nothing mean here?
> 
> In this patch series the TDX functions do sanity checks ensuring that
> mapping size <= folio size. IIUC the checks at mapping time, like in
> tdh_mem_page_aug() would be fine since at the time of mapping, the
> mapping size <= folio size, but we'd be in trouble at the time of
> zapping, since that's when mapping sizes > folio sizes get discovered.
> 
> The sanity checks are in principle in direct conflict with allowing
> mapping of multiple guest_memfd folios at hugepage level.
> 
> > (b) change the refcounting, or
> 
> I think this is pretty hard unless something changes in core MM that
> allows refcounting to be customizable by the FS. guest_memfd would love
> to have that, but customizable refcounting is going to hurt refcounting
> performance throughout the kernel.
> 
> > (c) add support for mapping multiple folios in one page,
> 
> Where would the changes need to be made, IIUC there aren't any checks
> currently elsewhere in KVM to ensure that mapping size <= folio size,
> other than the sanity checks in the TDX code proposed in this series.
> 
> Does any support need to be added, or is it about amending the
> unenforced/unwritten rule from "mapping size <= folio size" to "mapping
> size <= contiguous memory size"?
The rule is not "unenforced/unwritten". In fact, it's the de facto standard in
KVM.

For non-gmem cases, KVM uses the mapping size in the primary MMU as the max
mapping size in the secondary MMU, while the primary MMU does not create a
mapping larger than the backend folio size. When splitting the backend folio,
the Linux kernel unmaps the folio from both the primary MMU and the KVM-managed
secondary MMU (through the MMU notifier).

On the non-KVM side, though IOMMU stage-2 mappings are allowed to be larger
than folio sizes, splitting folios while they are still mapped in the IOMMU
stage-2 page table is not permitted due to the extra folio refcount held by the
IOMMU.

For gmem cases, KVM also does not create mappings larger than the folio size
allocated from gmem. This is why the TDX huge page series relies on gmem's
ability to allocate huge folios.

We really need to be careful if we hope to break this long-established rule.

> > probably comes down to which option provides "good
> > enough" performance without incurring too much complexity.