[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20241119135327.zjxlczjbli3wdo5o@amd.com>
Date: Tue, 19 Nov 2024 07:53:27 -0600
From: Michael Roth <michael.roth@....com>
To: Binbin Wu <binbin.wu@...ux.intel.com>
CC: Sean Christopherson <seanjc@...gle.com>, <kvm@...r.kernel.org>,
<linux-coco@...ts.linux.dev>, <linux-kernel@...r.kernel.org>,
<x86@...nel.org>, <pbonzini@...hat.com>, <jroedel@...e.de>,
<thomas.lendacky@....com>, <pgonda@...gle.com>, <ashish.kalra@....com>,
<bp@...en8.de>, <pankaj.gupta@....com>, <liam.merwick@...cle.com>, "Rick
Edgecombe" <rick.p.edgecombe@...el.com>, Reinette Chatre
<reinette.chatre@...el.com>, "Yamahata, Isaku" <isaku.yamahata@...el.com>,
"Peng, Chao P" <chao.p.peng@...el.com>
Subject: Re: [PATCH v1 4/5] KVM: Introduce KVM_EXIT_COCO exit type
On Fri, Jul 26, 2024 at 03:15:01PM +0800, Binbin Wu wrote:
>
>
> On 6/29/2024 8:36 AM, Michael Roth wrote:
> > On Fri, Jun 28, 2024 at 01:08:19PM -0700, Sean Christopherson wrote:
> > > On Wed, Jun 26, 2024, Michael Roth wrote:
> > > > On Wed, Jun 26, 2024 at 07:22:43AM -0700, Sean Christopherson wrote:
> > > > > On Fri, Jun 21, 2024, Michael Roth wrote:
> > > > > > diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
> > > > > > index ecfa25b505e7..2eea9828d9aa 100644
> > > > > > --- a/Documentation/virt/kvm/api.rst
> > > > > > +++ b/Documentation/virt/kvm/api.rst
> > > > > > @@ -7122,6 +7122,97 @@ Please note that the kernel is allowed to use the kvm_run structure as the
> > > > > > primary storage for certain register types. Therefore, the kernel may use the
> > > > > > values in kvm_run even if the corresponding bit in kvm_dirty_regs is not set.
> > > > > > +::
> > > > > > +
> > > > > > + /* KVM_EXIT_COCO */
> > > > > > + struct kvm_exit_coco {
> > > > > > + #define KVM_EXIT_COCO_REQ_CERTS 0
> > > > > > + #define KVM_EXIT_COCO_MAX 1
> > > > > > + __u8 nr;
> > > > > > + __u8 pad0[7];
> > > > > > + union {
> > > > > > + struct {
> > > > > > + __u64 gfn;
> > > > > > + __u32 npages;
> > > > > > + #define KVM_EXIT_COCO_REQ_CERTS_ERR_INVALID_LEN 1
> > > > > > + #define KVM_EXIT_COCO_REQ_CERTS_ERR_GENERIC (1 << 31)
> > > > > Unless I'm mistaken, these error codes are defined by the GHCB, which means the
> > > > > values matter, i.e. aren't arbitrary KVM-defined values.
> > > > They do happen to coincide with the GHCB-defined values:
> > > >
> > > > /*
> > > > * The GHCB spec only formally defines INVALID_LEN/BUSY VMM errors, but define
> > > > * a GENERIC error code such that it won't ever conflict with GHCB-defined
> > > > * errors if any get added in the future.
> > > > */
> > > > #define SNP_GUEST_VMM_ERR_INVALID_LEN 1
> > > > #define SNP_GUEST_VMM_ERR_BUSY 2
> > > > #define SNP_GUEST_VMM_ERR_GENERIC BIT(31)
> > > >
> > > > and not totally by accident. But the KVM_EXIT_COCO_REQ_CERTS_ERR_* are
> > > > defined/documented without any reliance on the GHCB spec and are purely
> > > > KVM-defined. I just didn't really see any reason to pick different
> > > > numerical values since it seems like purposely obfuscating things for
> > > For SNP. For other vendors, the numbers look bizarre, e.g. why bit 31? And the
> > > fact that it appears to be a mask is even more odd.
> > That's fair. Values 1 and 2 made sense so just re-use, but that results
> > in a awkward value for _GENERIC that's not really necessary for the KVM
> > side.
> >
> > > > no real reason. But the code itself doesn't rely on them being the same
> > > > as the spec defines, so we are free to define these however we'd like as
> > > > far as the KVM API goes.
> > > > > I forget exactly what we discussed in PUCK, but for the error codes, I think KVM
> > > > > should either define it's own values that are completely disconnected from any
> > > > > "harware" spec, or KVM should very explicitly #define all hardware values and have
> > > > I'd gotten the impression that option 1) is what we were sort of leaning
> > > > toward, and that's the approach taken here.
> > > > And if we expose things selectively to keep the ABI small, it's a bit
> > > > awkward too. For instance, KVM_EXIT_COCO_REQ_CERTS_ERR_* basically needs
> > > > a way to indicate success/fail/ENOMEM. Which we have with
> > > > (assuming 0==success):
> > > >
> > > > #define KVM_EXIT_COCO_REQ_CERTS_ERR_INVALID_LEN 1
> > > > #define KVM_EXIT_COCO_REQ_CERTS_ERR_GENERIC (1 << 31)
> > > >
> > > > But the GHCB also defines other values like:
> > > >
> > > > #define SNP_GUEST_VMM_ERR_BUSY 2
> > > >
> > > > which don't make much sense to handle on the userspace side and doesn't
> > > Why not? If userspace is waiting on a cert update for whatever reason, why can't
> > > it signal "busy" to the guest?
> > My thinking was that userspace is free to take it's time and doesn't need
> > to report delays back to KVM. But it would reduce the potential for
> > soft-lockups in the guest, so it might make sense to work that into the
> > API.
> >
> > But more to original point, there could be something added in the future
> > that really has nothing to do with anything involving KVM<->userspace
> > interaction and so would make no sense to expose to userspace.
> > Unfortunately I picked a bad example. :)
> >
> > > > really have anything to do with the KVM_EXIT_COCO_REQ_CERTS KVM event,
> > > > which is a separate/self-contained thing from the general guest request
> > > > protocol. So would we expose that as ABI or not? If not then we end up
> > > > with this weird splitting of code. And if yes, then we have to sort of
> > > > give userspace a way to discover whenever new error codes are added to
> > > > the GHCB spec, because KVM needs to understand these value too and
> > > Not necessarily. So long as KVM doesn't need to manipulate guest state, e.g. to
> > > set RBX (or whatever reg it is) for ERR_INVALID_LEN, then KVM doesn't need to
> > > care/know about the error codes. E.g. userspace could signal VMM_BUSY and KVM
> > > would happily pass that to the guest.
> > But given we already have an exception to that where KVM does need to
> > intervene for certain errors codes like ERR_INVALID_LEN that require
> > modifying guest state, it doesn't seem like a good starting position
> > to have to hope that it doesn't happen again.
> >
> > It just doesn't seem necessary to put ourselves in a situation where
> > we'd need to be concerned by that at all. If the KVM API is a separate
> > and fairly self-contained thing then these decisions are set in stone
> > until we want to change it and not dictated/modified by changes to
> > anything external without our explicit consideration.
> >
> > I know the certs things is GHCB-specific atm, but when the certs used
> > to live inside the kernel the KVM_EXIT_* wasn't needed at all, so
> > that's why I see this as more of a KVM interface thing rather than
> > a GHCB one. And maybe eventually some other CoCo implementation also
> > needs some interface for fetching certificates/blobs from userspace
> > and is able to re-use it still because it's not too SNP-specific
> > and the behavior isn't dictated by the GHCB spec (e.g.
> > ERR_INVALID_LEN might result in some other state needing to be
> > modified in their case rather than what the GHCB dictates.)
>
> TDX GHCI does have a similar PV interface for TDX guest to get quota, i.e.,
> TDG.VP.VMCALL<GetQuote>. This GetQuote PV interface is designed to invoke
> a request to generate a TD-Quote signing by a service hosting TD-Quoting
> Enclave operating in the host environment for a TD Report passed as a
> parameter by the TD.
> And the request will be forwarded to userspace for handling.
>
> So like GHCB, TDX needs to pass a shared buffer to userspace, which is
> specified by GPA and size (4K aligned) and get the error code from
> userspace and forward the error code to guest.
>
> But there are some differences from GHCB interface.
> 1. TDG.VP.VMCALL<GetQuote> is a a doorbell-like interface used to queue a
> request. I.e., it is an asynchronous request. The error code represents
> the status of request queuing, *not* the status of TD Quote generation..
> 2. Besides the error code returned by userspace for GetQuote interface, the
> GHCI spec defines a "Status Code" field in the header of the shared
> buffer.
> The "Status Code" field is also updated by VMM during the real handling
> of
> getting quote (after TDG.VP.VMCALL<GetQuote> returned to guest).
> After the TDG.VP.VMCALL<GetQuote> returned and back to TD guest, the TD
> guest can poll the "Status Code" field to check if the processing is
> in-flight, succeeded or failed.
> Since the real handling of getting quota is happening in userspace, and
> it will interact directly with guest, for TDX, it has to expose TDX
> specific error code to userspace to update the result of quote
> generation.
>
> Currently, TDX is about to add a new TDX specific KVM exit reason, i.e.,
> KVM_EXIT_TDX_GET_QUOTE and its related data structure based on a previous
> discussion. https://lore.kernel.org/kvm/Zg18ul8Q4PGQMWam@google.com/
> For the error code returned by userspace, KVM simply forward the error code
> to guest without further translation or handling.
>
> I am neutral to have a common KVM exit reason to handle both GHCB for
> REQ_CERTS and GHCI for GET_QUOTE. But for the error code, can we uses
> vendor
> specific error codes if KVM cares about the error code returned by userspace
> in vendor specific complete_userspace_io callback?
A few weeks back we discussed during the PUCK call on whether it makes
sense for use a common exit type for REQ_CERTS and TDX_GET_QUOTE, and
due to the asynchronous/polling nature of TDX_GET_QUOTE, and the
somewhat-particular file-locking requirements that need to be built into
the REQ_CERTS handling, we'd decided that it's probably more trouble
than it's worth to try to merge the 2.
However, I'm still hoping that KVM_EXIT_COCO might still provide some
useful infrastructure for introducing something like
KVM_EXIT_COCO_GET_QUOTE that implements the TDX-specific requirements
more directly.
I've just submitted v2 of KVM_EXIT_COCO where the userspace-provided
error codes are reworked to be less dependent on specific spec-defined
values but instead relies on standard error codes that KVM can provide
special handling for internally when needed:
https://lore.kernel.org/kvm/20241119133513.3612633-1-michael.roth@amd.com/
But I suppose in your case userspace would just return "SUCCESS"/0 and
then all the vendor-specific values are mainly in relation to the
"Status Code" field so it likely doesn't make a huge difference as far
as what userspace passes back to KVM.
Thanks,
Mike
>
> BTW, here is the plan of 4 hypercalls needing to exit to userspace for
> TDX basic support series:
> TDG.VP.VMCALL<SetupEventNotifyInterrupt>
> - Add a new KVM exit reason KVM_EXIT_TDX_SETUP_EVENT_NOTIFY
> TDG.VP.VMCALL<GetQuote>
> - Add a new KVM exit reason KVM_EXIT_TDX_GET_QUOTE
> TDG.VP.VMCALL<MapGPA>
> - Reuse KVM_EXIT_HYPERCALL with KVM_HC_MAP_GPA_RANGE
> TDG.VP.VMCALL<ReportFatalError>
> - Reuse KVM_EXIT_SYSTEM_EVENT but add a new type
> KVM_SYSTEM_EVENT_TDX_FATAL_ERROR
>
>
Powered by blists - more mailing lists