[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CALMp9eSn36W=YK5XtaVATJis-J--udGK4ZOESDhYyT0zJ4YZ9A@mail.gmail.com>
Date:   Fri, 12 Jun 2020 17:04:18 -0700
From:   Jim Mattson <jmattson@...gle.com>
To:     Babu Moger <babu.moger@....com>
Cc:     Wanpeng Li <wanpengli@...cent.com>, Joerg Roedel <joro@...tes.org>,
        "the arch/x86 maintainers" <x86@...nel.org>,
        Sean Christopherson <sean.j.christopherson@...el.com>,
        Ingo Molnar <mingo@...hat.com>, Borislav Petkov <bp@...en8.de>,
        "H . Peter Anvin" <hpa@...or.com>,
        Paolo Bonzini <pbonzini@...hat.com>,
        Vitaly Kuznetsov <vkuznets@...hat.com>,
        Thomas Gleixner <tglx@...utronix.de>,
        LKML <linux-kernel@...r.kernel.org>,
        kvm list <kvm@...r.kernel.org>
Subject: Re: [PATCH 3/3] KVM:SVM: Enable INVPCID feature on AMD
On Fri, Jun 12, 2020 at 2:47 PM Babu Moger <babu.moger@....com> wrote:
>
>
>
> On 6/12/20 3:10 PM, Jim Mattson wrote:
> > On Fri, Jun 12, 2020 at 12:35 PM Babu Moger <babu.moger@....com> wrote:
> >>
> >>
> >>
> >>> -----Original Message-----
> >>> From: Jim Mattson <jmattson@...gle.com>
> >>> Sent: Thursday, June 11, 2020 6:51 PM
> >>> To: Moger, Babu <Babu.Moger@....com>
> >>> Cc: Wanpeng Li <wanpengli@...cent.com>; Joerg Roedel <joro@...tes.org>;
> >>> the arch/x86 maintainers <x86@...nel.org>; Sean Christopherson
> >>> <sean.j.christopherson@...el.com>; Ingo Molnar <mingo@...hat.com>;
> >>> Borislav Petkov <bp@...en8.de>; H . Peter Anvin <hpa@...or.com>; Paolo
> >>> Bonzini <pbonzini@...hat.com>; Vitaly Kuznetsov <vkuznets@...hat.com>;
> >>> Thomas Gleixner <tglx@...utronix.de>; LKML <linux-kernel@...r.kernel.org>;
> >>> kvm list <kvm@...r.kernel.org>
> >>> Subject: Re: [PATCH 3/3] KVM:SVM: Enable INVPCID feature on AMD
> >>>
> >>> On Thu, Jun 11, 2020 at 2:48 PM Babu Moger <babu.moger@....com> wrote:
> >>>>
> >>>> The following intercept is added for INVPCID instruction:
> >>>> Code    Name            Cause
> >>>> A2h     VMEXIT_INVPCID  INVPCID instruction
> >>>>
> >>>> The following bit is added to the VMCB layout control area
> >>>> to control intercept of INVPCID:
> >>>> Byte Offset     Bit(s)    Function
> >>>> 14h             2         intercept INVPCID
> >>>>
> >>>> For the guests with nested page table (NPT) support, the INVPCID
> >>>> feature works as running it natively. KVM does not need to do any
> >>>> special handling in this case.
> >>>>
> >>>> Interceptions are required in the following cases.
> >>>> 1. If the guest tries to disable the feature when the underlying
> >>>> hardware supports it. In this case hypervisor needs to report #UD.
> >>>
> >>> Per the AMD documentation, attempts to use INVPCID at CPL>0 will
> >>> result in a #GP, regardless of the intercept bit. If the guest CPUID
> >>> doesn't enumerate the feature, shouldn't the instruction raise #UD
> >>> regardless of CPL? This seems to imply that we should intercept #GP
> >>> and decode the instruction to see if we should synthesize #UD instead.
> >>
> >> Purpose here is to report UD when the guest CPUID doesn't enumerate the
> >> INVPCID feature When Bare-metal supports it. It seems to work fine for
> >> that purpose. You are right. The #GP for CPL>0 takes precedence over
> >> interception. No. I am not planning to intercept GP.
> >
> > WIthout intercepting #GP, you fail to achieve your stated purpose.
>
> I think I have misunderstood this part. I was not inteding to change the
> #GP behaviour. I will remove this part. My intension of these series is to
> handle invpcid in shadow page mode. I have verified that part. Hope I did
> not miss anything else.
You don't really have to intercept INVPCID when tdp is in use, right?
There are certainly plenty of operations for which kvm does not
properly raise #UD when they aren't enumerated in the guest CPUID.
> >> I will change the text. How about this?
> >>
> >> Interceptions are required in the following cases.
> >> 1. If the guest CPUID doesn't enumerate the INVPCID feature when the
> >> underlying hardware supports it,  hypervisor needs to report UD. However,
> >> #GP for CPL>0 takes precedence over interception.
> >
> > This text is not internally consistent. In one sentence, you say that
> > "hypervisor needs to report #UD." In the next sentence, you are
> > essentially saying that the hypervisor doesn't need to report #UD.
> > Which is it?
> >
> >>>> 2. When the guest is running with shadow page table enabled, in
> >>>> this case the hypervisor needs to handle the tlbflush based on the
> >>>> type of invpcid instruction type.
> >>>>
> >>>> AMD documentation for INVPCID feature is available at "AMD64
> >>>> Architecture Programmer’s Manual Volume 2: System Programming,
> >>>> Pub. 24593 Rev. 3.34(or later)"
> >>>>
> >>>> The documentation can be obtained at the links below:
> >>>> Link:
> >>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.a
> >>> md.com%2Fsystem%2Ffiles%2FTechDocs%2F24593.pdf&data=02%7C01%7
> >>> Cbabu.moger%40amd.com%7C36861b25f6d143e3b38e08d80e624472%7C3dd8
> >>> 961fe4884e608e11a82d994e183d%7C0%7C0%7C637275163374103811&s
> >>> data=E%2Fdb6T%2BdO4nrtUoqhKidF6XyorsWrphj6O4WwNZpmYA%3D&res
> >>> erved=0
> >>>> Link:
> >>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbugzilla.
> >>> kernel.org%2Fshow_bug.cgi%3Fid%3D206537&data=02%7C01%7Cbabu.m
> >>> oger%40amd.com%7C36861b25f6d143e3b38e08d80e624472%7C3dd8961fe488
> >>> 4e608e11a82d994e183d%7C0%7C0%7C637275163374103811&sdata=b81
> >>> 9W%2FhKS93%2BAp3QvcsR0BwTQpUVUFMbIaNaisgWHRY%3D&reserved=
> >>> 0
> >>>>
> >>>> Signed-off-by: Babu Moger <babu.moger@....com>
> >>>> ---
> >>>>  arch/x86/include/asm/svm.h      |    4 ++++
> >>>>  arch/x86/include/uapi/asm/svm.h |    2 ++
> >>>>  arch/x86/kvm/svm/svm.c          |   42
> >>> +++++++++++++++++++++++++++++++++++++++
> >>>>  3 files changed, 48 insertions(+)
> >>>>
> >>>> diff --git a/arch/x86/include/asm/svm.h b/arch/x86/include/asm/svm.h
> >>>> index 62649fba8908..6488094f67fa 100644
> >>>> --- a/arch/x86/include/asm/svm.h
> >>>> +++ b/arch/x86/include/asm/svm.h
> >>>> @@ -55,6 +55,10 @@ enum {
> >>>>         INTERCEPT_RDPRU,
> >>>>  };
> >>>>
> >>>> +/* Extended Intercept bits */
> >>>> +enum {
> >>>> +       INTERCEPT_INVPCID = 2,
> >>>> +};
> >>>>
> >>>>  struct __attribute__ ((__packed__)) vmcb_control_area {
> >>>>         u32 intercept_cr;
> >>>> diff --git a/arch/x86/include/uapi/asm/svm.h
> >>> b/arch/x86/include/uapi/asm/svm.h
> >>>> index 2e8a30f06c74..522d42dfc28c 100644
> >>>> --- a/arch/x86/include/uapi/asm/svm.h
> >>>> +++ b/arch/x86/include/uapi/asm/svm.h
> >>>> @@ -76,6 +76,7 @@
> >>>>  #define SVM_EXIT_MWAIT_COND    0x08c
> >>>>  #define SVM_EXIT_XSETBV        0x08d
> >>>>  #define SVM_EXIT_RDPRU         0x08e
> >>>> +#define SVM_EXIT_INVPCID       0x0a2
> >>>>  #define SVM_EXIT_NPF           0x400
> >>>>  #define SVM_EXIT_AVIC_INCOMPLETE_IPI           0x401
> >>>>  #define SVM_EXIT_AVIC_UNACCELERATED_ACCESS     0x402
> >>>> @@ -171,6 +172,7 @@
> >>>>         { SVM_EXIT_MONITOR,     "monitor" }, \
> >>>>         { SVM_EXIT_MWAIT,       "mwait" }, \
> >>>>         { SVM_EXIT_XSETBV,      "xsetbv" }, \
> >>>> +       { SVM_EXIT_INVPCID,     "invpcid" }, \
> >>>>         { SVM_EXIT_NPF,         "npf" }, \
> >>>>         { SVM_EXIT_AVIC_INCOMPLETE_IPI,         "avic_incomplete_ipi" }, \
> >>>>         { SVM_EXIT_AVIC_UNACCELERATED_ACCESS,
> >>> "avic_unaccelerated_access" }, \
> >>>> diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> >>>> index 285e5e1ff518..82d974338f68 100644
> >>>> --- a/arch/x86/kvm/svm/svm.c
> >>>> +++ b/arch/x86/kvm/svm/svm.c
> >>>> @@ -813,6 +813,11 @@ static __init void svm_set_cpu_caps(void)
> >>>>         if (boot_cpu_has(X86_FEATURE_LS_CFG_SSBD) ||
> >>>>             boot_cpu_has(X86_FEATURE_AMD_SSBD))
> >>>>                 kvm_cpu_cap_set(X86_FEATURE_VIRT_SSBD);
> >>>> +
> >>>> +       /* Enable INVPCID if both PCID and INVPCID enabled */
> >>>> +       if (boot_cpu_has(X86_FEATURE_PCID) &&
> >>>> +           boot_cpu_has(X86_FEATURE_INVPCID))
> >>>> +               kvm_cpu_cap_set(X86_FEATURE_INVPCID);
> >>>>  }
> >>>>
> >>>>  static __init int svm_hardware_setup(void)
> >>>> @@ -1099,6 +1104,17 @@ static void init_vmcb(struct vcpu_svm *svm)
> >>>>                 clr_intercept(svm, INTERCEPT_PAUSE);
> >>>>         }
> >>>>
> >>>> +       /*
> >>>> +        * Intercept INVPCID instruction only if shadow page table is
> >>>> +        * enabled. Interception is not required with nested page table.
> >>>> +        */
> >>>> +       if (boot_cpu_has(X86_FEATURE_INVPCID)) {
> >>>> +               if (!npt_enabled)
> >>>> +                       set_extended_intercept(svm, INTERCEPT_INVPCID);
> >>>> +               else
> >>>> +                       clr_extended_intercept(svm, INTERCEPT_INVPCID);
> >>>> +       }
> >>>> +
> >>>>         if (kvm_vcpu_apicv_active(&svm->vcpu))
> >>>>                 avic_init_vmcb(svm);
> >>>>
> >>>> @@ -2715,6 +2731,23 @@ static int mwait_interception(struct vcpu_svm
> >>> *svm)
> >>>>         return nop_interception(svm);
> >>>>  }
> >>>>
> >>>> +static int invpcid_interception(struct vcpu_svm *svm)
> >>>> +{
> >>>> +       struct kvm_vcpu *vcpu = &svm->vcpu;
> >>>> +       unsigned long type;
> >>>> +       gva_t gva;
> >>>> +
> >>>> +       /*
> >>>> +        * For an INVPCID intercept:
> >>>> +        * EXITINFO1 provides the linear address of the memory operand.
> >>>> +        * EXITINFO2 provides the contents of the register operand.
> >>>> +        */
> >>>> +       type = svm->vmcb->control.exit_info_2;
> >>>> +       gva = svm->vmcb->control.exit_info_1;
> >>>> +
> >>>> +       return kvm_handle_invpcid_types(vcpu,  gva, type);
> >>>> +}
> >>>> +
> >>>>  static int (*const svm_exit_handlers[])(struct vcpu_svm *svm) = {
> >>>>         [SVM_EXIT_READ_CR0]                     = cr_interception,
> >>>>         [SVM_EXIT_READ_CR3]                     = cr_interception,
> >>>> @@ -2777,6 +2810,7 @@ static int (*const svm_exit_handlers[])(struct
> >>> vcpu_svm *svm) = {
> >>>>         [SVM_EXIT_MWAIT]                        = mwait_interception,
> >>>>         [SVM_EXIT_XSETBV]                       = xsetbv_interception,
> >>>>         [SVM_EXIT_RDPRU]                        = rdpru_interception,
> >>>> +       [SVM_EXIT_INVPCID]                      = invpcid_interception,
> >>>>         [SVM_EXIT_NPF]                          = npf_interception,
> >>>>         [SVM_EXIT_RSM]                          = rsm_interception,
> >>>>         [SVM_EXIT_AVIC_INCOMPLETE_IPI]          =
> >>> avic_incomplete_ipi_interception,
> >>>> @@ -3562,6 +3596,14 @@ static void svm_cpuid_update(struct kvm_vcpu
> >>> *vcpu)
> >>>>         svm->nrips_enabled = kvm_cpu_cap_has(X86_FEATURE_NRIPS) &&
> >>>>                              guest_cpuid_has(&svm->vcpu, X86_FEATURE_NRIPS);
> >>>>
> >>>> +       /*
> >>>> +        * Intercept INVPCID instruction if the baremetal has the support
> >>>> +        * but the guest doesn't claim the feature.
> >>>> +        */
> >>>> +       if (boot_cpu_has(X86_FEATURE_INVPCID) &&
> >>>> +           !guest_cpuid_has(vcpu, X86_FEATURE_INVPCID))
> >>>> +               set_extended_intercept(svm, INTERCEPT_INVPCID);
> >>>> +
> >>>
> >>> What if INVPCID is enabled in the guest CPUID later? Shouldn't we then
> >>> clear this intercept bit?
> >>
> >> I assume the feature enable comes in the same code path as this. I can add
> >> "if else" check here if that is what you are suggesting.
> >
> > Yes, that's what I'm suggesting.
> >
> >>>
> >>>>         if (!kvm_vcpu_apicv_active(vcpu))
> >>>>                 return;
> >>>>
> >>>>
Powered by blists - more mailing lists
 
