linux-kernel - Re: [RFC PATCH 08/21] KVM: TDX: Increase/decrease folio ref for huge pages

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <aFDMO5mefGubO50c@yzhao56-desk.sh.intel.com>
Date: Tue, 17 Jun 2025 10:00:27 +0800
From: Yan Zhao <yan.y.zhao@...el.com>
To: "Edgecombe, Rick P" <rick.p.edgecombe@...el.com>
CC: "Annapurve, Vishal" <vannapurve@...gle.com>, "kvm@...r.kernel.org"
	<kvm@...r.kernel.org>, "quic_eberman@...cinc.com" <quic_eberman@...cinc.com>,
	"Li, Xiaoyao" <xiaoyao.li@...el.com>, "Shutemov, Kirill"
	<kirill.shutemov@...el.com>, "Hansen, Dave" <dave.hansen@...el.com>,
	"david@...hat.com" <david@...hat.com>, "thomas.lendacky@....com"
	<thomas.lendacky@....com>, "tabba@...gle.com" <tabba@...gle.com>,
	"vbabka@...e.cz" <vbabka@...e.cz>, "Du, Fan" <fan.du@...el.com>,
	"michael.roth@....com" <michael.roth@....com>, "seanjc@...gle.com"
	<seanjc@...gle.com>, "Weiny, Ira" <ira.weiny@...el.com>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
	"pbonzini@...hat.com" <pbonzini@...hat.com>, "ackerleytng@...gle.com"
	<ackerleytng@...gle.com>, "Yamahata, Isaku" <isaku.yamahata@...el.com>,
	"binbin.wu@...ux.intel.com" <binbin.wu@...ux.intel.com>, "Peng, Chao P"
	<chao.p.peng@...el.com>, "Li, Zhiquan1" <zhiquan1.li@...el.com>,
	"jroedel@...e.de" <jroedel@...e.de>, "Miao, Jun" <jun.miao@...el.com>,
	"pgonda@...gle.com" <pgonda@...gle.com>, "x86@...nel.org" <x86@...nel.org>
Subject: Re: [RFC PATCH 08/21] KVM: TDX: Increase/decrease folio ref for huge
 pages

On Tue, Jun 17, 2025 at 08:25:20AM +0800, Edgecombe, Rick P wrote:
> On Mon, 2025-06-16 at 17:59 +0800, Yan Zhao wrote:
> > > Few questions here:
> > > 1) It sounds like the failure to remove entries from SEPT could only
> > > be due to bugs in the KVM/TDX module,
> > Yes.
> 
> A TDX module bug could hypothetically cause many types of host instability. We
> should consider a little more on the context for the risk before we make TDX a
> special case or add much error handling code around it. If we end up with a
> bunch of paranoid error handling code around TDX module behavior, that is going
> to be a pain to maintain. And error handling code for rare cases will be hard to
> remove.
> 
> We've had a history of unreliable page removal during the base series
> development. When we solved the problem, it was not completely clean (though
> more on the guest affecting side). So I think there is reason to be concerned.
> But this should work reliably in theory. So I'm not sure we should use the error
> case as a hard reason. Instead maybe we should focus on how to make it less
> likely to have an error. Unless there is a specific case you are considering,
> Yan?
Yes, KVM/TDX does its utmost to ensure that page removal cannot fail. However,
if bugs occur, KVM/TDX will trigger a BUG_ON and leak the problematic page.
This is a simple way to constrain the error within affected pages. It also helps
in debugging when unexpected errors arise.

Returning the error code up the stack is not worthwhile and I don't even think
it's feasible.


> That said, I think the refcounting on error (or rather, notifying guestmemfd on
> error do let it handle the error how it wants) is a fine solution. As long as it
> doesn't take much code (as is the case for Yan's POC).
> 
> > 
> > > how reliable would it be to
> > > continue executing TDX VMs on the host once such bugs are hit?
> > The TDX VMs will be killed. However, the private pages are still mapped in the
> > SEPT (after the unmapping failure).
> > The teardown flow for TDX VM is:
> > 
> > do_exit
> >   |->exit_files
> >      |->kvm_gmem_release ==> (1) Unmap guest pages 
> >      |->release kvmfd
> >         |->kvm_destroy_vm  (2) Reclaiming resources
> >            |->kvm_arch_pre_destroy_vm  ==> Release hkid
> >            |->kvm_arch_destroy_vm  ==> Reclaim SEPT page table pages
> > 
> > Without holding page reference after (1) fails, the guest pages may have been
> > re-assigned by the host OS while they are still still tracked in the TDX
> > module.
> > 
> > 
> > > 2) Is it reliable to continue executing the host kernel and other
> > > normal VMs once such bugs are hit?
> > If with TDX holding the page ref count, the impact of unmapping failure of
> > guest
> > pages is just to leak those pages.
> 
> If the kernel might be able to continue working, it should try. It should warn
> if there is a risk, so people can use panic_on_warn if they want to stop the
> kernel.
> 
> > 
> > > 3) Can the memory be reclaimed reliably if the VM is marked as dead
> > > and cleaned up right away?
> > As in the above flow, TDX needs to hold the page reference on unmapping
> > failure
> > until after reclaiming is successful. Well, reclaiming itself is possible to
> > fail either.
> 
> We could ask TDX module folks if there is anything they could guarantee.
>