linux-kernel - Re: [RFC PATCH v2 05/18] KVM: TDX: Drop superfluous page pinning in S-EPT management

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <97a422c0ba7a5d68b35b5327d3bf0cd11429c300.camel@intel.com>
Date: Tue, 2 Sep 2025 18:55:46 +0000
From: "Edgecombe, Rick P" <rick.p.edgecombe@...el.com>
To: "seanjc@...gle.com" <seanjc@...gle.com>, "Zhao, Yan Y"
	<yan.y.zhao@...el.com>
CC: "Huang, Kai" <kai.huang@...el.com>, "ackerleytng@...gle.com"
	<ackerleytng@...gle.com>, "Annapurve, Vishal" <vannapurve@...gle.com>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>, "Weiny, Ira"
	<ira.weiny@...el.com>, "kvm@...r.kernel.org" <kvm@...r.kernel.org>,
	"michael.roth@....com" <michael.roth@....com>, "pbonzini@...hat.com"
	<pbonzini@...hat.com>
Subject: Re: [RFC PATCH v2 05/18] KVM: TDX: Drop superfluous page pinning in
 S-EPT management

On Tue, 2025-09-02 at 10:33 -0700, Sean Christopherson wrote:
> > Besides, a cache flush after 2 can essentially cause a memory write to the
> > page.
> > Though we could invoke tdh_phymem_page_wbinvd_hkid() after the KVM_BUG_ON(),
> > the SEAMCALL itself can fail.
> 
> I think this falls into the category of "don't screw up" flows.  Failure to
> remove a private SPTE is a near-catastrophic error.  Going out of our way to
> reduce the impact of such errors increases complexity without providing much
> in the way of value.
> 
> E.g. if VMCLEAR fails, KVM WARNs but continues on and hopes for the best, even
> though there's a decent chance failure to purge the VMCS cache entry could be
> lead to UAF-like problems.  To me, this is largely the same.
> 
> If anything, we should try to prevent #2, e.g. by marking the entire
> guest_memfd as broken or something, and then deliberately leaking _all_ pages.

There was a marathon thread on this subject. We did discuss this option (link to
most relevant part I could find):
https://lore.kernel.org/kvm/a9affa03c7cdc8109d0ed6b5ca30ec69269e2f34.camel@intel.com/

The high level summary is that pinning the pages wrinkles guestmemfd's plans to
use refcount for other tracking purposes. Dropping refcounts interferes with the
error handling safety.

I strongly agree that we should not optimize for the error path at all. If we
could bug the guestmemfd (kind of what we were discussing in that link) I think
it would be appropriate to use in these cases. I guess the question is are we ok
dropping the safety before we have a solution like that. In that thread I was
advocating for yes, partly to close it because the conversation was getting
stuck. But there is probably a long tail of potential issues or ways of looking
at it that could put it in the grey area.