linux-kernel - Re: [PATCH v1 4/5] KVM: Introduce KVM_EXIT

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAAH4kHZ-9ajaLH8C1N2MKzFuBKjx+BVk9-t24xhyEL3AKEeMQQ@mail.gmail.com>
Date: Fri, 13 Sep 2024 09:29:08 -0700
From: Dionna Amalie Glaze <dionnaglaze@...gle.com>
To: Binbin Wu <binbin.wu@...ux.intel.com>
Cc: Michael Roth <michael.roth@....com>, Sean Christopherson <seanjc@...gle.com>, kvm@...r.kernel.org, 
	linux-coco@...ts.linux.dev, linux-kernel@...r.kernel.org, x86@...nel.org, 
	pbonzini@...hat.com, jroedel@...e.de, thomas.lendacky@....com, 
	pgonda@...gle.com, ashish.kalra@....com, bp@...en8.de, pankaj.gupta@....com, 
	liam.merwick@...cle.com, Rick Edgecombe <rick.p.edgecombe@...el.com>, 
	Reinette Chatre <reinette.chatre@...el.com>, "Yamahata, Isaku" <isaku.yamahata@...el.com>, 
	"Peng, Chao P" <chao.p.peng@...el.com>
Subject: Re: [PATCH v1 4/5] KVM: Introduce KVM_EXIT_COCO exit type

On Fri, Jul 26, 2024 at 12:15 AM Binbin Wu <binbin.wu@...ux.intel.com> wrote:
>
>
>
> On 6/29/2024 8:36 AM, Michael Roth wrote:
> > On Fri, Jun 28, 2024 at 01:08:19PM -0700, Sean Christopherson wrote:
> >> On Wed, Jun 26, 2024, Michael Roth wrote:
> >>> On Wed, Jun 26, 2024 at 07:22:43AM -0700, Sean Christopherson wrote:
> >>>> On Fri, Jun 21, 2024, Michael Roth wrote:
> >>>>> diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
> >>>>> index ecfa25b505e7..2eea9828d9aa 100644
> >>>>> --- a/Documentation/virt/kvm/api.rst
> >>>>> +++ b/Documentation/virt/kvm/api.rst
> >>>>> @@ -7122,6 +7122,97 @@ Please note that the kernel is allowed to use the kvm_run structure as the
> >>>>>   primary storage for certain register types. Therefore, the kernel may use the
> >>>>>   values in kvm_run even if the corresponding bit in kvm_dirty_regs is not set.
> >>>>>
> >>>>> +::
> >>>>> +
> >>>>> +         /* KVM_EXIT_COCO */
> >>>>> +         struct kvm_exit_coco {
> >>>>> +         #define KVM_EXIT_COCO_REQ_CERTS                 0
> >>>>> +         #define KVM_EXIT_COCO_MAX                       1
> >>>>> +                 __u8 nr;
> >>>>> +                 __u8 pad0[7];
> >>>>> +                 union {
> >>>>> +                         struct {
> >>>>> +                                 __u64 gfn;
> >>>>> +                                 __u32 npages;
> >>>>> +         #define KVM_EXIT_COCO_REQ_CERTS_ERR_INVALID_LEN         1
> >>>>> +         #define KVM_EXIT_COCO_REQ_CERTS_ERR_GENERIC             (1 << 31)
> >>>> Unless I'm mistaken, these error codes are defined by the GHCB, which means the
> >>>> values matter, i.e. aren't arbitrary KVM-defined values.
> >>> They do happen to coincide with the GHCB-defined values:
> >>>
> >>>    /*
> >>>     * The GHCB spec only formally defines INVALID_LEN/BUSY VMM errors, but define
> >>>     * a GENERIC error code such that it won't ever conflict with GHCB-defined
> >>>     * errors if any get added in the future.
> >>>     */
> >>>    #define SNP_GUEST_VMM_ERR_INVALID_LEN   1
> >>>    #define SNP_GUEST_VMM_ERR_BUSY          2
> >>>    #define SNP_GUEST_VMM_ERR_GENERIC       BIT(31)
> >>>

VMM_ERR_BUSY means something very specific to the GHCB protocol, which
is that a request that would normally have increased a message
sequence number was not able to be sent, and the exact same message
would need to be sent again, otherwise the cryptographic protocol
breaks down.
In the event of firmware hotloading, SNP_COMMIT, and needing to get
the right version of VCEK certificate to the VM guest, we could detect
a conflict and need to say VMM_ERR_BUSY2 (or something) that says try
again, but the sequence number did go up, we just couldn't coordinate
a non-atomic data transfer afterward to be correct.

There's a different way to solve the data race without a retry, but
I'm not 100% confident that we can really generalize the error space
across TEEs.

To support the coordination of SNP_DOWNLOAD_FIRMWARE_EX, SNP_COMMIT,
and extended guest requests, user space needs to be told which
TCB_VERSION certificate it needs to provide. It can be wrong if it
relies on its own call to SNP_PLATFORM_STATUS.
Given that userspace can't interpret the report (encrypted by VMPCK),
it won't know exactly which VCEK certificate to provide given that
SNP_COMMIT can happen before or after an attestation report is taken
and before KVM exits to userspace for the certificates.

We can extend the ccp driver to, on extended guest request, lock the
command buffer, get the REPORTED_TCB, complete the request, unlock the
command buffer, and return both the response and the REPORTED_TCB at
the time of the request. That will give userspace enough info to give
the right certificate. That would mean a more specific
KVM_EXIT_COCO_REQ_EXIT message than just where to put the certs. A
SEV-SNP TCB_VERSION is also platform-specific, so not particularly
generalizable to "COCO".

We could also say that extended_guest_request is inherently racy and
have AMD extend the GHCB spec with a new request type that doesn't
communicate with the ASP at all and instead just requests certificates
for a given REPORTED_TCB. The guest VM can read an attestation
report's reported_tcb field and craft this request. I don't know if we
want to have an arbitrary message passing interface between guest VM
and user space VMM without very specific restrictions. We ought to
just have paravirtualized I/O devices then.

> >>> and not totally by accident. But the KVM_EXIT_COCO_REQ_CERTS_ERR_* are
> >>> defined/documented without any reliance on the GHCB spec and are purely
> >>> KVM-defined. I just didn't really see any reason to pick different
> >>> numerical values since it seems like purposely obfuscating things for
> >> For SNP.  For other vendors, the numbers look bizarre, e.g. why bit 31?  And the
> >> fact that it appears to be a mask is even more odd.
> > That's fair. Values 1 and 2 made sense so just re-use, but that results
> > in a awkward value for _GENERIC that's not really necessary for the KVM
> > side.
> >
> >>> no real reason. But the code itself doesn't rely on them being the same
> >>> as the spec defines, so we are free to define these however we'd like as
> >>> far as the KVM API goes.
> >>>> I forget exactly what we discussed in PUCK, but for the error codes, I think KVM
> >>>> should either define it's own values that are completely disconnected from any
> >>>> "harware" spec, or KVM should very explicitly #define all hardware values and have
> >>> I'd gotten the impression that option 1) is what we were sort of leaning
> >>> toward, and that's the approach taken here.
> >>> And if we expose things selectively to keep the ABI small, it's a bit
> >>> awkward too. For instance, KVM_EXIT_COCO_REQ_CERTS_ERR_* basically needs
> >>> a way to indicate success/fail/ENOMEM. Which we have with
> >>> (assuming 0==success):
> >>>
> >>>    #define KVM_EXIT_COCO_REQ_CERTS_ERR_INVALID_LEN         1
> >>>    #define KVM_EXIT_COCO_REQ_CERTS_ERR_GENERIC             (1 << 31)
> >>>
> >>> But the GHCB also defines other values like:
> >>>
> >>>    #define SNP_GUEST_VMM_ERR_BUSY          2
> >>>
> >>> which don't make much sense to handle on the userspace side and doesn't
> >> Why not?  If userspace is waiting on a cert update for whatever reason, why can't
> >> it signal "busy" to the guest?
> > My thinking was that userspace is free to take it's time and doesn't need
> > to report delays back to KVM. But it would reduce the potential for
> > soft-lockups in the guest, so it might make sense to work that into the
> > API.
> >
> > But more to original point, there could be something added in the future
> > that really has nothing to do with anything involving KVM<->userspace
> > interaction and so would make no sense to expose to userspace.
> > Unfortunately I picked a bad example. :)
> >
> >>> really have anything to do with the KVM_EXIT_COCO_REQ_CERTS KVM event,
> >>> which is a separate/self-contained thing from the general guest request
> >>> protocol. So would we expose that as ABI or not? If not then we end up
> >>> with this weird splitting of code. And if yes, then we have to sort of
> >>> give userspace a way to discover whenever new error codes are added to
> >>> the GHCB spec, because KVM needs to understand these value too and
> >> Not necessarily.  So long as KVM doesn't need to manipulate guest state, e.g. to
> >> set RBX (or whatever reg it is) for ERR_INVALID_LEN, then KVM doesn't need to
> >> care/know about the error codes.  E.g. userspace could signal VMM_BUSY and KVM
> >> would happily pass that to the guest.
> > But given we already have an exception to that where KVM does need to
> > intervene for certain errors codes like ERR_INVALID_LEN that require
> > modifying guest state, it doesn't seem like a good starting position
> > to have to hope that it doesn't happen again.
> >
> > It just doesn't seem necessary to put ourselves in a situation where
> > we'd need to be concerned by that at all. If the KVM API is a separate
> > and fairly self-contained thing then these decisions are set in stone
> > until we want to change it and not dictated/modified by changes to
> > anything external without our explicit consideration.
> >
> > I know the certs things is GHCB-specific atm, but when the certs used
> > to live inside the kernel the KVM_EXIT_* wasn't needed at all, so
> > that's why I see this as more of a KVM interface thing rather than
> > a GHCB one. And maybe eventually some other CoCo implementation also
> > needs some interface for fetching certificates/blobs from userspace
> > and is able to re-use it still because it's not too SNP-specific
> > and the behavior isn't dictated by the GHCB spec (e.g.
> > ERR_INVALID_LEN might result in some other state needing to be
> > modified in their case rather than what the GHCB dictates.)
>
> TDX GHCI does have a similar PV interface for TDX guest to get quota, i.e.,
> TDG.VP.VMCALL<GetQuote>.  This GetQuote PV interface is designed to invoke
> a request to generate a TD-Quote signing by a service hosting TD-Quoting
> Enclave operating in the host environment for a TD Report passed as a
> parameter by the TD.
> And the request will be forwarded to userspace for handling.
>
> So like GHCB, TDX needs to pass a shared buffer to userspace, which is
> specified by GPA and size (4K aligned) and get the error code from
> userspace and forward the error code to guest.
>
> But there are some differences from GHCB interface.
> 1. TDG.VP.VMCALL<GetQuote> is a a doorbell-like interface used to queue a
>     request. I.e., it is an asynchronous request.  The error code represents
>     the status of request queuing, *not* the status of TD Quote generation..
> 2. Besides the error code returned by userspace for GetQuote interface, the
>     GHCI spec defines a "Status Code" field in the header of the shared
> buffer.
>     The "Status Code" field is also updated by VMM during the real
> handling of
>     getting quote (after TDG.VP.VMCALL<GetQuote> returned to guest).
>     After the TDG.VP.VMCALL<GetQuote> returned and back to TD guest, the TD
>     guest can poll the "Status Code" field to check if the processing is
>     in-flight, succeeded or failed.
>     Since the real handling of getting quota is happening in userspace, and
>     it will interact directly with guest, for TDX, it has to expose TDX
>     specific error code to userspace to update the result of quote
> generation.
>
> Currently, TDX is about to add a new TDX specific KVM exit reason, i.e.,
> KVM_EXIT_TDX_GET_QUOTE and its related data structure based on a previous
> discussion. https://lore.kernel.org/kvm/Zg18ul8Q4PGQMWam@google.com/
> For the error code returned by userspace, KVM simply forward the error code
> to guest without further translation or handling.
>
> I am neutral to have a common KVM exit reason to handle both GHCB for
> REQ_CERTS and GHCI for GET_QUOTE.  But for the error code, can we uses
> vendor
> specific error codes if KVM cares about the error code returned by userspace
> in vendor specific complete_userspace_io callback?
>
> BTW, here is the plan of 4 hypercalls needing to exit to userspace for
> TDX basic support series:
> TDG.VP.VMCALL<SetupEventNotifyInterrupt>
> - Add a new KVM exit reason KVM_EXIT_TDX_SETUP_EVENT_NOTIFY
> TDG.VP.VMCALL<GetQuote>
> - Add a new KVM exit reason KVM_EXIT_TDX_GET_QUOTE
> TDG.VP.VMCALL<MapGPA>
> - Reuse KVM_EXIT_HYPERCALL with KVM_HC_MAP_GPA_RANGE
> TDG.VP.VMCALL<ReportFatalError>
> - Reuse KVM_EXIT_SYSTEM_EVENT but add a new type
>    KVM_SYSTEM_EVENT_TDX_FATAL_ERROR
>
>
>


--
-Dionna Glaze, PhD, CISSP, CCSP (she/her)