linux-kernel - Re: [RFC PATCH 08/21] KVM: TDX: Increase/decrease folio ref for huge pages

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <aGNrlWw1K6nkWdmg@yzhao56-desk.sh.intel.com>
Date: Tue, 1 Jul 2025 13:01:09 +0800
From: Yan Zhao <yan.y.zhao@...el.com>
To: "Edgecombe, Rick P" <rick.p.edgecombe@...el.com>
CC: "ackerleytng@...gle.com" <ackerleytng@...gle.com>,
	"quic_eberman@...cinc.com" <quic_eberman@...cinc.com>, "Li, Xiaoyao"
	<xiaoyao.li@...el.com>, "Shutemov, Kirill" <kirill.shutemov@...el.com>,
	"Hansen, Dave" <dave.hansen@...el.com>, "david@...hat.com"
	<david@...hat.com>, "thomas.lendacky@....com" <thomas.lendacky@....com>,
	"vbabka@...e.cz" <vbabka@...e.cz>, "tabba@...gle.com" <tabba@...gle.com>,
	"Du, Fan" <fan.du@...el.com>, "michael.roth@....com" <michael.roth@....com>,
	"seanjc@...gle.com" <seanjc@...gle.com>, "binbin.wu@...ux.intel.com"
	<binbin.wu@...ux.intel.com>, "Peng, Chao P" <chao.p.peng@...el.com>,
	"kvm@...r.kernel.org" <kvm@...r.kernel.org>, "Yamahata, Isaku"
	<isaku.yamahata@...el.com>, "linux-kernel@...r.kernel.org"
	<linux-kernel@...r.kernel.org>, "Weiny, Ira" <ira.weiny@...el.com>,
	"pbonzini@...hat.com" <pbonzini@...hat.com>, "Li, Zhiquan1"
	<zhiquan1.li@...el.com>, "Annapurve, Vishal" <vannapurve@...gle.com>,
	"jroedel@...e.de" <jroedel@...e.de>, "Miao, Jun" <jun.miao@...el.com>,
	"pgonda@...gle.com" <pgonda@...gle.com>, "x86@...nel.org" <x86@...nel.org>
Subject: Re: [RFC PATCH 08/21] KVM: TDX: Increase/decrease folio ref for huge
 pages

On Tue, Jul 01, 2025 at 05:45:54AM +0800, Edgecombe, Rick P wrote:
> On Mon, 2025-06-30 at 12:25 -0700, Ackerley Tng wrote:
> > > So for this we can do something similar. Have the arch/x86 side of TDX grow
> > > a
> > > new tdx_buggy_shutdown(). Have it do an all-cpu IPI to kick CPUs out of
> > > SEAMMODE, wbivnd, and set a "no more seamcalls" bool. Then any SEAMCALLs
> > > after
> > > that will return a TDX_BUGGY_SHUTDOWN error, or similar. All TDs in the
> > > system
> > > die. Zap/cleanup paths return success in the buggy shutdown case.
> > > 
> > 
> > Do you mean that on unmap/split failure:
> 
> Maybe Yan can clarify here. I thought the HWpoison scenario was about TDX module
My thinking is to set HWPoison to private pages whenever KVM_BUG_ON() was hit in
TDX. i.e., when the page is still mapped in S-EPT but the TD is bugged on and
about to tear down.

So, it could be due to KVM or TDX module bugs, which retries can't help.

> bugs. Not TDX busy errors, demote failures, etc. If there are "normal" failures,
> like the ones that can be fixed with retries, then I think HWPoison is not a
> good option though.
> 
> >  there is a way to make 100%
> > sure all memory becomes re-usable by the rest of the host, using
> > tdx_buggy_shutdown(), wbinvd, etc?

Not sure about this approach. When TDX module is buggy and the page is still
accessible to guest as private pages, even with no-more SEAMCALLs flag, is it
safe enough for guest_memfd/hugetlb to re-assign the page to allow simultaneous
access in shared memory with potential private access from TD or TDX module?

> I think so. If we think the error conditions are rare enough that the cost of
> killing all TDs is acceptable, then we should do a proper POC and give it some
> scrutiny.
> 
> > 
> > If yes, then I'm onboard with this, and if we are 100% sure all memory
> > becomes re-usable by the host after all the extensive cleanup, then we
> > don't need to HWpoison anything.
> 
> For eventual upstream acceptance, we need to stop and think every time TDX
> requires special handling in generic code. This is why I wanted to clarify if
> you guys think the scenario could be in any way considered a generic one.
> (IOMMU, etc).