[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <aLlRlbaq84IRvNPv@google.com>
Date: Thu, 4 Sep 2025 01:45:09 -0700
From: Sean Christopherson <seanjc@...gle.com>
To: Rick P Edgecombe <rick.p.edgecombe@...el.com>
Cc: Yan Y Zhao <yan.y.zhao@...el.com>, Kai Huang <kai.huang@...el.com>,
"ackerleytng@...gle.com" <ackerleytng@...gle.com>, Vishal Annapurve <vannapurve@...gle.com>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>, Ira Weiny <ira.weiny@...el.com>,
"kvm@...r.kernel.org" <kvm@...r.kernel.org>, "michael.roth@....com" <michael.roth@....com>,
"pbonzini@...hat.com" <pbonzini@...hat.com>
Subject: Re: [RFC PATCH v2 05/18] KVM: TDX: Drop superfluous page pinning in
S-EPT management
On Tue, Sep 02, 2025, Rick P Edgecombe wrote:
> On Tue, 2025-09-02 at 10:33 -0700, Sean Christopherson wrote:
> > > Besides, a cache flush after 2 can essentially cause a memory write to the
> > > page.
> > > Though we could invoke tdh_phymem_page_wbinvd_hkid() after the KVM_BUG_ON(),
> > > the SEAMCALL itself can fail.
> >
> > I think this falls into the category of "don't screw up" flows. Failure to
> > remove a private SPTE is a near-catastrophic error. Going out of our way to
> > reduce the impact of such errors increases complexity without providing much
> > in the way of value.
> >
> > E.g. if VMCLEAR fails, KVM WARNs but continues on and hopes for the best, even
> > though there's a decent chance failure to purge the VMCS cache entry could be
> > lead to UAF-like problems. To me, this is largely the same.
> >
> > If anything, we should try to prevent #2, e.g. by marking the entire
> > guest_memfd as broken or something, and then deliberately leaking _all_ pages.
>
> There was a marathon thread on this subject.
Holy moly, you weren't kidding.
> We did discuss this option (link to
> most relevant part I could find):
> https://lore.kernel.org/kvm/a9affa03c7cdc8109d0ed6b5ca30ec69269e2f34.camel@intel.com/
>
> The high level summary is that pinning the pages wrinkles guestmemfd's plans to
> use refcount for other tracking purposes. Dropping refcounts interferes with the
> error handling safety.
It also bakes even more assumptions into TDX about guest_memfd being backed with
"struct page", which I would like to avoid doing whenever possible.
> I strongly agree that we should not optimize for the error path at all. If we
> could bug the guestmemfd (kind of what we were discussing in that link) I think
> it would be appropriate to use in these cases. I guess the question is are we ok
> dropping the safety before we have a solution like that.
Definitely a "yes" from me. For this to actually cause real world problems, we'd
need a critical KVM, hardware, or TDX-Module bug, and several unlikely events to
all line up.
If someone encounters any of these KVM_BUG_ON()s _and_ has observed that the
probability of data corruption is meaningful, then we can always convert one or
more of these to full BUG_ON() conditions, but I don't see any reason to do that
without strong evidence that it's necessary.
> In that thread I was advocating for yes, partly to close it because the
> conversation was getting stuck. But there is probably a long tail of
> potential issues or ways of looking at it that could put it in the grey area.
Powered by blists - more mailing lists