lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <f8dfeab2-e5f2-4df6-9406-0aff36afc08a@linux.intel.com>
Date: Fri, 26 Jul 2024 15:15:01 +0800
From: Binbin Wu <binbin.wu@...ux.intel.com>
To: Michael Roth <michael.roth@....com>,
 Sean Christopherson <seanjc@...gle.com>
Cc: kvm@...r.kernel.org, linux-coco@...ts.linux.dev,
 linux-kernel@...r.kernel.org, x86@...nel.org, pbonzini@...hat.com,
 jroedel@...e.de, thomas.lendacky@....com, pgonda@...gle.com,
 ashish.kalra@....com, bp@...en8.de, pankaj.gupta@....com,
 liam.merwick@...cle.com, Rick Edgecombe <rick.p.edgecombe@...el.com>,
 Reinette Chatre <reinette.chatre@...el.com>,
 "Yamahata, Isaku" <isaku.yamahata@...el.com>,
 "Peng, Chao P" <chao.p.peng@...el.com>
Subject: Re: [PATCH v1 4/5] KVM: Introduce KVM_EXIT_COCO exit type



On 6/29/2024 8:36 AM, Michael Roth wrote:
> On Fri, Jun 28, 2024 at 01:08:19PM -0700, Sean Christopherson wrote:
>> On Wed, Jun 26, 2024, Michael Roth wrote:
>>> On Wed, Jun 26, 2024 at 07:22:43AM -0700, Sean Christopherson wrote:
>>>> On Fri, Jun 21, 2024, Michael Roth wrote:
>>>>> diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
>>>>> index ecfa25b505e7..2eea9828d9aa 100644
>>>>> --- a/Documentation/virt/kvm/api.rst
>>>>> +++ b/Documentation/virt/kvm/api.rst
>>>>> @@ -7122,6 +7122,97 @@ Please note that the kernel is allowed to use the kvm_run structure as the
>>>>>   primary storage for certain register types. Therefore, the kernel may use the
>>>>>   values in kvm_run even if the corresponding bit in kvm_dirty_regs is not set.
>>>>>   
>>>>> +::
>>>>> +
>>>>> +		/* KVM_EXIT_COCO */
>>>>> +		struct kvm_exit_coco {
>>>>> +		#define KVM_EXIT_COCO_REQ_CERTS			0
>>>>> +		#define KVM_EXIT_COCO_MAX			1
>>>>> +			__u8 nr;
>>>>> +			__u8 pad0[7];
>>>>> +			union {
>>>>> +				struct {
>>>>> +					__u64 gfn;
>>>>> +					__u32 npages;
>>>>> +		#define KVM_EXIT_COCO_REQ_CERTS_ERR_INVALID_LEN		1
>>>>> +		#define KVM_EXIT_COCO_REQ_CERTS_ERR_GENERIC		(1 << 31)
>>>> Unless I'm mistaken, these error codes are defined by the GHCB, which means the
>>>> values matter, i.e. aren't arbitrary KVM-defined values.
>>> They do happen to coincide with the GHCB-defined values:
>>>
>>>    /*
>>>     * The GHCB spec only formally defines INVALID_LEN/BUSY VMM errors, but define
>>>     * a GENERIC error code such that it won't ever conflict with GHCB-defined
>>>     * errors if any get added in the future.
>>>     */
>>>    #define SNP_GUEST_VMM_ERR_INVALID_LEN   1
>>>    #define SNP_GUEST_VMM_ERR_BUSY          2
>>>    #define SNP_GUEST_VMM_ERR_GENERIC       BIT(31)
>>>
>>> and not totally by accident. But the KVM_EXIT_COCO_REQ_CERTS_ERR_* are
>>> defined/documented without any reliance on the GHCB spec and are purely
>>> KVM-defined. I just didn't really see any reason to pick different
>>> numerical values since it seems like purposely obfuscating things for
>> For SNP.  For other vendors, the numbers look bizarre, e.g. why bit 31?  And the
>> fact that it appears to be a mask is even more odd.
> That's fair. Values 1 and 2 made sense so just re-use, but that results
> in a awkward value for _GENERIC that's not really necessary for the KVM
> side.
>
>>> no real reason. But the code itself doesn't rely on them being the same
>>> as the spec defines, so we are free to define these however we'd like as
>>> far as the KVM API goes.
>>>> I forget exactly what we discussed in PUCK, but for the error codes, I think KVM
>>>> should either define it's own values that are completely disconnected from any
>>>> "harware" spec, or KVM should very explicitly #define all hardware values and have
>>> I'd gotten the impression that option 1) is what we were sort of leaning
>>> toward, and that's the approach taken here.
>>> And if we expose things selectively to keep the ABI small, it's a bit
>>> awkward too. For instance, KVM_EXIT_COCO_REQ_CERTS_ERR_* basically needs
>>> a way to indicate success/fail/ENOMEM. Which we have with
>>> (assuming 0==success):
>>>
>>>    #define KVM_EXIT_COCO_REQ_CERTS_ERR_INVALID_LEN         1
>>>    #define KVM_EXIT_COCO_REQ_CERTS_ERR_GENERIC             (1 << 31)
>>>
>>> But the GHCB also defines other values like:
>>>
>>>    #define SNP_GUEST_VMM_ERR_BUSY          2
>>>
>>> which don't make much sense to handle on the userspace side and doesn't
>> Why not?  If userspace is waiting on a cert update for whatever reason, why can't
>> it signal "busy" to the guest?
> My thinking was that userspace is free to take it's time and doesn't need
> to report delays back to KVM. But it would reduce the potential for
> soft-lockups in the guest, so it might make sense to work that into the
> API.
>
> But more to original point, there could be something added in the future
> that really has nothing to do with anything involving KVM<->userspace
> interaction and so would make no sense to expose to userspace.
> Unfortunately I picked a bad example. :)
>
>>> really have anything to do with the KVM_EXIT_COCO_REQ_CERTS KVM event,
>>> which is a separate/self-contained thing from the general guest request
>>> protocol. So would we expose that as ABI or not? If not then we end up
>>> with this weird splitting of code. And if yes, then we have to sort of
>>> give userspace a way to discover whenever new error codes are added to
>>> the GHCB spec, because KVM needs to understand these value too and
>> Not necessarily.  So long as KVM doesn't need to manipulate guest state, e.g. to
>> set RBX (or whatever reg it is) for ERR_INVALID_LEN, then KVM doesn't need to
>> care/know about the error codes.  E.g. userspace could signal VMM_BUSY and KVM
>> would happily pass that to the guest.
> But given we already have an exception to that where KVM does need to
> intervene for certain errors codes like ERR_INVALID_LEN that require
> modifying guest state, it doesn't seem like a good starting position
> to have to hope that it doesn't happen again.
>
> It just doesn't seem necessary to put ourselves in a situation where
> we'd need to be concerned by that at all. If the KVM API is a separate
> and fairly self-contained thing then these decisions are set in stone
> until we want to change it and not dictated/modified by changes to
> anything external without our explicit consideration.
>
> I know the certs things is GHCB-specific atm, but when the certs used
> to live inside the kernel the KVM_EXIT_* wasn't needed at all, so
> that's why I see this as more of a KVM interface thing rather than
> a GHCB one. And maybe eventually some other CoCo implementation also
> needs some interface for fetching certificates/blobs from userspace
> and is able to re-use it still because it's not too SNP-specific
> and the behavior isn't dictated by the GHCB spec (e.g.
> ERR_INVALID_LEN might result in some other state needing to be
> modified in their case rather than what the GHCB dictates.)

TDX GHCI does have a similar PV interface for TDX guest to get quota, i.e.,
TDG.VP.VMCALL<GetQuote>.  This GetQuote PV interface is designed to invoke
a request to generate a TD-Quote signing by a service hosting TD-Quoting
Enclave operating in the host environment for a TD Report passed as a
parameter by the TD.
And the request will be forwarded to userspace for handling.

So like GHCB, TDX needs to pass a shared buffer to userspace, which is
specified by GPA and size (4K aligned) and get the error code from
userspace and forward the error code to guest.

But there are some differences from GHCB interface.
1. TDG.VP.VMCALL<GetQuote> is a a doorbell-like interface used to queue a
    request. I.e., it is an asynchronous request.  The error code represents
    the status of request queuing, *not* the status of TD Quote generation..
2. Besides the error code returned by userspace for GetQuote interface, the
    GHCI spec defines a "Status Code" field in the header of the shared 
buffer.
    The "Status Code" field is also updated by VMM during the real 
handling of
    getting quote (after TDG.VP.VMCALL<GetQuote> returned to guest).
    After the TDG.VP.VMCALL<GetQuote> returned and back to TD guest, the TD
    guest can poll the "Status Code" field to check if the processing is
    in-flight, succeeded or failed.
    Since the real handling of getting quota is happening in userspace, and
    it will interact directly with guest, for TDX, it has to expose TDX
    specific error code to userspace to update the result of quote 
generation.

Currently, TDX is about to add a new TDX specific KVM exit reason, i.e.,
KVM_EXIT_TDX_GET_QUOTE and its related data structure based on a previous
discussion. https://lore.kernel.org/kvm/Zg18ul8Q4PGQMWam@google.com/
For the error code returned by userspace, KVM simply forward the error code
to guest without further translation or handling.

I am neutral to have a common KVM exit reason to handle both GHCB for
REQ_CERTS and GHCI for GET_QUOTE.  But for the error code, can we uses 
vendor
specific error codes if KVM cares about the error code returned by userspace
in vendor specific complete_userspace_io callback?

BTW, here is the plan of 4 hypercalls needing to exit to userspace for
TDX basic support series:
TDG.VP.VMCALL<SetupEventNotifyInterrupt>
- Add a new KVM exit reason KVM_EXIT_TDX_SETUP_EVENT_NOTIFY
TDG.VP.VMCALL<GetQuote>
- Add a new KVM exit reason KVM_EXIT_TDX_GET_QUOTE
TDG.VP.VMCALL<MapGPA>
- Reuse KVM_EXIT_HYPERCALL with KVM_HC_MAP_GPA_RANGE
TDG.VP.VMCALL<ReportFatalError>
- Reuse KVM_EXIT_SYSTEM_EVENT but add a new type
   KVM_SYSTEM_EVENT_TDX_FATAL_ERROR



Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ