linux-kernel - Re: [RFC PATCH 08/21] KVM: TDX: Increase/decrease folio ref for huge pages

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <aGOr90RZDLEJhieE@yzhao56-desk.sh.intel.com>
Date: Tue, 1 Jul 2025 17:35:51 +0800
From: Yan Zhao <yan.y.zhao@...el.com>
To: "Edgecombe, Rick P" <rick.p.edgecombe@...el.com>
CC: "ackerleytng@...gle.com" <ackerleytng@...gle.com>, "Shutemov, Kirill"
	<kirill.shutemov@...el.com>, "Li, Xiaoyao" <xiaoyao.li@...el.com>,
	"kvm@...r.kernel.org" <kvm@...r.kernel.org>, "Hansen, Dave"
	<dave.hansen@...el.com>, "david@...hat.com" <david@...hat.com>,
	"thomas.lendacky@....com" <thomas.lendacky@....com>, "tabba@...gle.com"
	<tabba@...gle.com>, "vbabka@...e.cz" <vbabka@...e.cz>,
	"quic_eberman@...cinc.com" <quic_eberman@...cinc.com>, "michael.roth@....com"
	<michael.roth@....com>, "linux-kernel@...r.kernel.org"
	<linux-kernel@...r.kernel.org>, "seanjc@...gle.com" <seanjc@...gle.com>,
	"Peng, Chao P" <chao.p.peng@...el.com>, "Du, Fan" <fan.du@...el.com>,
	"Yamahata, Isaku" <isaku.yamahata@...el.com>, "pbonzini@...hat.com"
	<pbonzini@...hat.com>, "binbin.wu@...ux.intel.com"
	<binbin.wu@...ux.intel.com>, "Weiny, Ira" <ira.weiny@...el.com>, "Li,
 Zhiquan1" <zhiquan1.li@...el.com>, "Annapurve, Vishal"
	<vannapurve@...gle.com>, "jroedel@...e.de" <jroedel@...e.de>, "Miao, Jun"
	<jun.miao@...el.com>, "pgonda@...gle.com" <pgonda@...gle.com>,
	"x86@...nel.org" <x86@...nel.org>
Subject: Re: [RFC PATCH 08/21] KVM: TDX: Increase/decrease folio ref for huge
 pages

On Tue, Jul 01, 2025 at 01:55:43AM +0800, Edgecombe, Rick P wrote:
> So for this we can do something similar. Have the arch/x86 side of TDX grow a
> new tdx_buggy_shutdown(). Have it do an all-cpu IPI to kick CPUs out of
> SEAMMODE, wbivnd, and set a "no more seamcalls" bool. Then any SEAMCALLs after
> that will return a TDX_BUGGY_SHUTDOWN error, or similar. All TDs in the system
> die. Zap/cleanup paths return success in the buggy shutdown case.
All TDs in the system die could be too severe for unmap errors due to KVM bugs.

> Does it fit? Or, can you guys argue that the failures here are actually non-
> special cases that are worth more complex recovery? I remember we talked about
> IOMMU patterns that are similar, but it seems like the remaining cases under
> discussion are about TDX bugs.
I didn't mention TDX connect previously to avoid introducing unnecessary
complexity.

For TDX connect, S-EPT is used for private mappings in IOMMU. Unmap could
therefore fail due to pages being pinned for DMA.

So, my thinking was that if that happens, KVM could set a special flag to folios
pinned for private DMA.

Then guest_memfd could check the special flag before allowing private-to-shared
conversion, or punch hole.
guest_memfd could check this special flag and choose to poison or leak the
folio.

Otherwise, if we choose tdx_buggy_shutdown() to "do an all-cpu IPI to kick CPUs
out of SEAMMODE, wbivnd, and set a "no more seamcalls" bool", DMAs may still
have access to the private pages mapped in S-EPT.