linux-kernel - Re: [PATCH v15 09/20] KVM: SEV: Add support to handle MSR based Page State Change VMGEXIT

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <rczrxq3lhqguarwh4cwxwa35j5riiagbilcw32oaxd7aqpyaq7@6bqrqn6ontba>
Date: Tue, 21 May 2024 16:49:52 -0500
From: Michael Roth <michael.roth@....com>
To: Binbin Wu <binbin.wu@...ux.intel.com>
CC: Paolo Bonzini <pbonzini@...hat.com>, <kvm@...r.kernel.org>,
	<linux-coco@...ts.linux.dev>, <linux-mm@...ck.org>,
	<linux-crypto@...r.kernel.org>, <x86@...nel.org>,
	<linux-kernel@...r.kernel.org>, <tglx@...utronix.de>, <mingo@...hat.com>,
	<jroedel@...e.de>, <thomas.lendacky@....com>, <hpa@...or.com>,
	<ardb@...nel.org>, <seanjc@...gle.com>, <vkuznets@...hat.com>,
	<jmattson@...gle.com>, <luto@...nel.org>, <dave.hansen@...ux.intel.com>,
	<slp@...hat.com>, <pgonda@...gle.com>, <peterz@...radead.org>,
	<srinivas.pandruvada@...ux.intel.com>, <rientjes@...gle.com>,
	<dovmurik@...ux.ibm.com>, <tobin@....com>, <bp@...en8.de>, <vbabka@...e.cz>,
	<kirill@...temov.name>, <ak@...ux.intel.com>, <tony.luck@...el.com>,
	<sathyanarayanan.kuppuswamy@...ux.intel.com>, <alpergun@...gle.com>,
	<jarkko@...nel.org>, <ashish.kalra@....com>, <nikunj.dadhania@....com>,
	<pankaj.gupta@....com>, <liam.merwick@...cle.com>, Brijesh Singh
	<brijesh.singh@....com>, "Yamahata, Isaku" <isaku.yamahata@...el.com>
Subject: Re: [PATCH v15 09/20] KVM: SEV: Add support to handle MSR based Page
 State Change VMGEXIT

On Tue, May 21, 2024 at 08:49:59AM +0800, Binbin Wu wrote:
> 
> 
> On 5/17/2024 1:23 AM, Paolo Bonzini wrote:
> > On Thu, May 16, 2024 at 10:29 AM Binbin Wu <binbin.wu@...ux.intel.com> wrote:
> > > 
> > > 
> > > On 5/1/2024 4:51 PM, Michael Roth wrote:
> > > > SEV-SNP VMs can ask the hypervisor to change the page state in the RMP
> > > > table to be private or shared using the Page State Change MSR protocol
> > > > as defined in the GHCB specification.
> > > > 
> > > > When using gmem, private/shared memory is allocated through separate
> > > > pools, and KVM relies on userspace issuing a KVM_SET_MEMORY_ATTRIBUTES
> > > > KVM ioctl to tell the KVM MMU whether or not a particular GFN should be
> > > > backed by private memory or not.
> > > > 
> > > > Forward these page state change requests to userspace so that it can
> > > > issue the expected KVM ioctls. The KVM MMU will handle updating the RMP
> > > > entries when it is ready to map a private page into a guest.
> > > > 
> > > > Use the existing KVM_HC_MAP_GPA_RANGE hypercall format to deliver these
> > > > requests to userspace via KVM_EXIT_HYPERCALL.
> > > > 
> > > > Signed-off-by: Michael Roth <michael.roth@....com>
> > > > Co-developed-by: Brijesh Singh <brijesh.singh@....com>
> > > > Signed-off-by: Brijesh Singh <brijesh.singh@....com>
> > > > Signed-off-by: Ashish Kalra <ashish.kalra@....com>
> > > > ---
> > > >    arch/x86/include/asm/sev-common.h |  6 ++++
> > > >    arch/x86/kvm/svm/sev.c            | 48 +++++++++++++++++++++++++++++++
> > > >    2 files changed, 54 insertions(+)
> > > > 
> > > > diff --git a/arch/x86/include/asm/sev-common.h b/arch/x86/include/asm/sev-common.h
> > > > index 1006bfffe07a..6d68db812de1 100644
> > > > --- a/arch/x86/include/asm/sev-common.h
> > > > +++ b/arch/x86/include/asm/sev-common.h
> > > > @@ -101,11 +101,17 @@ enum psc_op {
> > > >        /* GHCBData[11:0] */                            \
> > > >        GHCB_MSR_PSC_REQ)
> > > > 
> > > > +#define GHCB_MSR_PSC_REQ_TO_GFN(msr) (((msr) & GENMASK_ULL(51, 12)) >> 12)
> > > > +#define GHCB_MSR_PSC_REQ_TO_OP(msr) (((msr) & GENMASK_ULL(55, 52)) >> 52)
> > > > +
> > > >    #define GHCB_MSR_PSC_RESP           0x015
> > > >    #define GHCB_MSR_PSC_RESP_VAL(val)                  \
> > > >        /* GHCBData[63:32] */                           \
> > > >        (((u64)(val) & GENMASK_ULL(63, 32)) >> 32)
> > > > 
> > > > +/* Set highest bit as a generic error response */
> > > > +#define GHCB_MSR_PSC_RESP_ERROR (BIT_ULL(63) | GHCB_MSR_PSC_RESP)
> > > > +
> > > >    /* GHCB Hypervisor Feature Request/Response */
> > > >    #define GHCB_MSR_HV_FT_REQ          0x080
> > > >    #define GHCB_MSR_HV_FT_RESP         0x081
> > > > diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
> > > > index e1ac5af4cb74..720775c9d0b8 100644
> > > > --- a/arch/x86/kvm/svm/sev.c
> > > > +++ b/arch/x86/kvm/svm/sev.c
> > > > @@ -3461,6 +3461,48 @@ static void set_ghcb_msr(struct vcpu_svm *svm, u64 value)
> > > >        svm->vmcb->control.ghcb_gpa = value;
> > > >    }
> > > > 
> > > > +static int snp_complete_psc_msr(struct kvm_vcpu *vcpu)
> > > > +{
> > > > +     struct vcpu_svm *svm = to_svm(vcpu);
> > > > +
> > > > +     if (vcpu->run->hypercall.ret)
> > > Do we have definition of ret? I didn't find clear documentation about it.
> > > According to the code, 0 means succssful. Is there any other error codes
> > > need to or can be interpreted?
> > They are defined in include/uapi/linux/kvm_para.h
> > 
> > #define KVM_ENOSYS        1000
> > #define KVM_EFAULT        EFAULT /* 14 */
> > #define KVM_EINVAL        EINVAL /* 22 */
> > #define KVM_E2BIG        E2BIG /* 7 */
> > #define KVM_EPERM        EPERM /* 1*/
> > #define KVM_EOPNOTSUPP        95
> > 
> > Linux however does not expect the hypercall to fail for SEV/SEV-ES; and
> > it will terminate the guest if the PSC operation fails for SEV-SNP.  So
> > it's best for userspace if the hypercall always succeeds. :)
> Thanks for the info.
> 
> For TDX, it wants to restrict the size of memory range for conversion in one
> hypercall to avoid a too long latency.
> Previously, in TDX QEMU patchset v5, the limitation is in userspace and  if
> the size is too big, the status_code will set to TDG_VP_VMCALL_RETRY and the
> failed GPA for guest to retry is updated.
> https://lore.kernel.org/all/20240229063726.610065-51-xiaoyao.li@intel.com/
> 
> When TDX converts TDVMCALL_MAP_GPA to KVM_HC_MAP_GPA_RANGE, do you think
> which is more reasonable to set the restriction? In KVM (TDX specific code)
> or userspace?
> If userspace is preferred, then the interface needs to  be extended to
> support it.

With SNP we might get a batch of requests in a single GHCB request, and
potentially each of those requests need to get set out to userspace as 
a single KVM_HC_MAP_GPA_RANGE. The subsequent patch here handles that in
a loop by issuing a new KVM_HC_MAP_GPA_RANGE via the completion handler.
So we also sort of need to split large requests into multiple userspace
requests in some cases.

It seems like TDX should be able to do something similar by limiting the
size of each KVM_HC_MAP_GPA_RANGE to TDX_MAP_GPA_MAX_LEN, and then
returning TDG_VP_VMCALL_RETRY to guest if the original size was greater
than TDX_MAP_GPA_MAX_LEN. But at that point you're effectively done with
the entire request and can return to guest, so it actually seems a little
more straightforward than the SNP case above. E.g. TDX has a 1:1 mapping
between TDG_VP_VMCALL_MAP_GPA and KVM_HC_MAP_GPA_RANGE events. (And even
similar names :))

So doesn't seem like there's a good reason to expose any of these
throttling details to userspace, in which case existing
KVM_HC_MAP_GPA_RANGE interface seems like it should be sufficient.

-Mike

> 
> 
> > 
> > > For TDX, it may also want to use KVM_HC_MAP_GPA_RANGE hypercall  to
> > > userspace via KVM_EXIT_HYPERCALL.
> > Yes, definitely.
> > 
> > Paolo
> > 
>