linux-kernel - Re: [RFC PATCH v2 05/18] KVM: TDX: Drop superfluous page pinning in S-EPT management

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <aLcqVtqxrKtzziK-@google.com>
Date: Tue, 2 Sep 2025 10:33:10 -0700
From: Sean Christopherson <seanjc@...gle.com>
To: Yan Zhao <yan.y.zhao@...el.com>
Cc: Rick P Edgecombe <rick.p.edgecombe@...el.com>, "pbonzini@...hat.com" <pbonzini@...hat.com>, 
	Kai Huang <kai.huang@...el.com>, "ackerleytng@...gle.com" <ackerleytng@...gle.com>, 
	Vishal Annapurve <vannapurve@...gle.com>, 
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>, Ira Weiny <ira.weiny@...el.com>, 
	"kvm@...r.kernel.org" <kvm@...r.kernel.org>, "michael.roth@....com" <michael.roth@....com>
Subject: Re: [RFC PATCH v2 05/18] KVM: TDX: Drop superfluous page pinning in
 S-EPT management

On Mon, Sep 01, 2025, Yan Zhao wrote:
> On Sat, Aug 30, 2025 at 03:53:24AM +0800, Edgecombe, Rick P wrote:
> > On Thu, 2025-08-28 at 17:06 -0700, Sean Christopherson wrote:
> > > From: Yan Zhao <yan.y.zhao@...el.com>
> > > When S-EPT zapping errors occur, KVM_BUG_ON() is invoked to kick off all
> > > vCPUs and mark the VM as dead. Although there is a potential window that a
> > > private page mapped in the S-EPT could be reallocated and used outside the
> > > VM, the loud warning from KVM_BUG_ON() should provide sufficient debug
> > > information.
> ... 
> > Yan, can you clarify what you mean by "there could be a small window"? I'm
> > thinking this is a hypothetical window around vm_dead races? Or more concrete? I
> > *don't* want to re-open the debate on whether to go with this approach, but I
> > think this is a good teaching edge case to settle on how we want to treat
> > similar issues. So I just want to make sure we have the justification right.
> I think this window isn't hypothetical.
> 
> 1. SEAMCALL failure in tdx_sept_remove_private_spte().

But tdx_sept_remove_private_spte() failing is already a hypothetical scenario.

>    KVM_BUG_ON() sets vm_dead and kicks off all vCPUs.
> 2. guest_memfd invalidation completes. memory is freed.
> 3. VM gets killed.
> 
> After 2, the page is still mapped in the S-EPT, but it could potentially be
> reallocated and used outside the VM.
> 
> >From the TDX module and hardware's perspective, the mapping in the S-EPT for
> this page remains valid. So, I'm uncertain if the TDX module might do something
> creative to access the guest page after 2.
> 
> Besides, a cache flush after 2 can essentially cause a memory write to the page.
> Though we could invoke tdh_phymem_page_wbinvd_hkid() after the KVM_BUG_ON(), the
> SEAMCALL itself can fail.

I think this falls into the category of "don't screw up" flows.  Failure to remove
a private SPTE is a near-catastrophic error.  Going out of our way to reduce the
impact of such errors increases complexity without providing much in the way of
value.

E.g. if VMCLEAR fails, KVM WARNs but continues on and hopes for the best, even
though there's a decent chance failure to purge the VMCS cache entry could be
lead to UAF-like problems.  To me, this is largely the same.

If anything, we should try to prevent #2, e.g. by marking the entire guest_memfd
as broken or something, and then deliberately leaking _all_ pages.