linux-kernel - Re: [PATCHv2 3/7] KVM: SVM: Add VNMI support in get/set_nmi

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <d5df7e9e18528de56c41c24958901ace1e2d0aca.camel@redhat.com>
Date:   Thu, 21 Jul 2022 15:01:24 +0300
From:   Maxim Levitsky <mlevitsk@...hat.com>
To:     "Shukla, Santosh" <santosh.shukla@....com>,
        Paolo Bonzini <pbonzini@...hat.com>
Cc:     Sean Christopherson <seanjc@...gle.com>,
        Vitaly Kuznetsov <vkuznets@...hat.com>,
        Jim Mattson <jmattson@...gle.com>,
        Joerg Roedel <joro@...tes.org>,
        Tom Lendacky <thomas.lendacky@....com>, kvm@...r.kernel.org,
        linux-kernel@...r.kernel.org
Subject: Re: [PATCHv2 3/7] KVM: SVM: Add VNMI support in get/set_nmi_mask

On Thu, 2022-07-21 at 15:04 +0530, Shukla, Santosh wrote:
> 
> On 7/10/2022 9:45 PM, Maxim Levitsky wrote:
> > On Sat, 2022-07-09 at 19:12 +0530, Santosh Shukla wrote:
> > > VMCB intr_ctrl bit12 (V_NMI_MASK) is set by the processor when handling
> > > NMI in guest and is cleared after the NMI is handled. Treat V_NMI_MASK as
> > > read-only in the hypervisor and do not populate set accessors.
> > > 
> > > Adding API(get_vnmi_vmcb) in order to return the correct vmcb for L1 or
> > > L2.
> > > 
> > > Signed-off-by: Santosh Shukla <santosh.shukla@....com>
> > > ---
> > > v2:
> > > - Added get_vnmi_vmcb API to return vmcb for l1 and l2.
> > > - Use get_vnmi_vmcb to get correct vmcb in func -
> > >   is_vnmi_enabled/_mask_set()
> > > - removed vnmi check from is_vnmi_enabled() func.
> > > 
> > >  arch/x86/kvm/svm/svm.c | 12 ++++++++++--
> > >  arch/x86/kvm/svm/svm.h | 32 ++++++++++++++++++++++++++++++++
> > >  2 files changed, 42 insertions(+), 2 deletions(-)
> > > 
> > > diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> > > index baaf35be36e5..3574e804d757 100644
> > > --- a/arch/x86/kvm/svm/svm.c
> > > +++ b/arch/x86/kvm/svm/svm.c
> > > @@ -198,7 +198,7 @@ module_param(dump_invalid_vmcb, bool, 0644);
> > >  bool intercept_smi = true;
> > >  module_param(intercept_smi, bool, 0444);
> > >  
> > > -static bool vnmi;
> > > +bool vnmi = true;
> > >  module_param(vnmi, bool, 0444);
> > >  
> > >  static bool svm_gp_erratum_intercept = true;
> > > @@ -3503,13 +3503,21 @@ static int svm_nmi_allowed(struct kvm_vcpu *vcpu, bool for_injection)
> > >  
> > >  static bool svm_get_nmi_mask(struct kvm_vcpu *vcpu)
> > >  {
> > > -	return !!(vcpu->arch.hflags & HF_NMI_MASK);
> > > +	struct vcpu_svm *svm = to_svm(vcpu);
> > > +
> > > +	if (is_vnmi_enabled(svm))
> > > +		return is_vnmi_mask_set(svm);
> > > +	else
> > > +		return !!(vcpu->arch.hflags & HF_NMI_MASK);
> > >  }
> > >  
> > >  static void svm_set_nmi_mask(struct kvm_vcpu *vcpu, bool masked)
> > >  {
> > >  	struct vcpu_svm *svm = to_svm(vcpu);
> > >  
> > > +	if (is_vnmi_enabled(svm))
> > > +		return;
> > > +
> > >  	if (masked) {
> > >  		vcpu->arch.hflags |= HF_NMI_MASK;
> > >  		if (!sev_es_guest(vcpu->kvm))
> > > diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
> > > index 9223ac100ef5..f36e30df6202 100644
> > > --- a/arch/x86/kvm/svm/svm.h
> > > +++ b/arch/x86/kvm/svm/svm.h
> > > @@ -35,6 +35,7 @@ extern u32 msrpm_offsets[MSRPM_OFFSETS] __read_mostly;
> > >  extern bool npt_enabled;
> > >  extern int vgif;
> > >  extern bool intercept_smi;
> > > +extern bool vnmi;
> > >  
> > >  /*
> > >   * Clean bits in VMCB.
> > > @@ -509,6 +510,37 @@ static inline bool nested_npt_enabled(struct vcpu_svm *svm)
> > >  	return svm->nested.ctl.nested_ctl & SVM_NESTED_CTL_NP_ENABLE;
> > >  }
> > >  
> > > +static inline struct vmcb *get_vnmi_vmcb(struct vcpu_svm *svm)
> > > +{
> > > +	if (!vnmi)
> > > +		return NULL;
> > > +
> > > +	if (is_guest_mode(&svm->vcpu))
> > > +		return svm->nested.vmcb02.ptr;
> > > +	else
> > > +		return svm->vmcb01.ptr;
> > > +}
> > 
> > This is better but still not enough to support nesting:
> > 
> > 
> > Let me explain the cases that we need to cover:
> > 
> > 
> > 1. non nested case, vmcb01 has all the VNMI settings,
> > and I think it should work, but need to review the patches again.
> > 
> > 
> > 
> > 2. L1 uses vNMI, L2 doesn't use vNMI (nested_vnmi_enabled() == false).
> > 
> >   In this case, vNMI settings just need to be copied from vmcb01 to vmcb02
> >   and vise versa during nested entry and exit.
> > 
> > 
> >   This means that nested_vmcb02_prepare_control in this case should copy
> >   all 3 bits from vmcb01 to vmcb02, and vise versa nested_svm_vmexit
> >   should copy them back.
> > 
> >   Currently I see no indication of this being done in this patch series.
> > 
> 
> Yes, Thanks for pointing out, in v3 series.
> 
> >   vmcb02 should indeed be used to read vnmi bits (like done above).
> > 
> > 
> > 3. L1 uses vNMI, L2 uses vNMI:
> > 
> >   - First of all in this case all 3 vNMI bits should be copied from vmcb12
> >     to vmcb02 on nested entry and back on nested VM exit.
> > 
> >     I *think* this is done correctly in the patch 6, but I need to check again.
> > 
> >  
> >   - Second issue, depends on vNMI spec which we still don't have, and it
> >     relates to the fact on what to do if NMIs are not intercepted by
> >     the (nested) hypervisor, and L0 wants to inject an NMI
> > 
> >     (from L1 point of view it means that a 'real' NMI is about to be
> >     received while L2 is running).
> > 
> > 
> >     - If VNMI is not allowed to be enabled when NMIs are not intercepted,
> >       (vast majority of nested hypervisors will want to intercept real NMIs)
> >       then everything is fine -
> > 
> >       this means that if nested vNMI is enabled, then L1 will have
> >       to intercept 'real' NMIs, and thus L0 would be able to always
> >       inject 'real' NMIs while L2 is running by doing a VM exit to L1 without
> >       touching any vNMI state.
> > 
> Yes. Enabling NMI virtualization requires the NMI intercept bit to be set.

Those are very good news. 

What would happen though if the guest doesn't intercept NMI,
and still tries to enable vNMI? 

Failed VM entry or vNMI ignored?

This matters for nested because nested must work the same as real hardware.

In either of the cases some code is needed to emulate this correctly in the nested
virtualization code in KVM, but the patches have none.

Best regards,
	Maxim Levitsky


> 
> >     - If the vNMI spec states that if vNMI is enabled, real NMIs
> >       are not intercepted and a real NMI is arriving, then the CPU
> >       will use vNMI state to handle it (that is it will set the 'pending'
> >       bit, then check if 'masked' bit is set, and if not, move pending to masked
> >       and deliver NMI to L2, in this case, it is indeed right to use vmcb02
> >       and keep on using VNMI for NMIs that are directed to L1,
> >       but I highly doubt that this is the case.
> > 
> > 
> No.
> 
> >     - Most likely case - vNMI is allowed without NMI intercept,
> >       and real NMI does't consult the vNMI bits, but rather uses 'hosts'
> >       NMI masking. IRET doesn't affect host's NMI' masking as well.
> > 
> > 
> 
> No.
> 
> Thanks,
> Santosh
>  
> >       In this case, when L0 wants to inject NMI to a nested guest
> >       that has vNMI enabled, and doesn't intercept NMIs, it
> >       has to:
> > 
> >       - still consult the vNMI pending/masked bits of *vmcb01*,
> >         to know if it can inject a NMI
> > 
> >       - if it can inject it, it should update *manually* the pending/masked bits
> >         of vmcb01 as well, so that L1's vNMI the state remains consistent.
> > 
> >       - inject the NMI to L2, in the old fashioned way with EVENTINJ,
> > 	or open NMI window by intercepting IRET if NMI is masked.
> > 
> > 
> > Best regards,
> > 	Maxim Levitsky
> > 
> > 
> > 
> > 
> > > +
> > > +static inline bool is_vnmi_enabled(struct vcpu_svm *svm)
> > > +{
> > > +	struct vmcb *vmcb = get_vnmi_vmcb(svm);
> > > +
> > > +	if (vmcb)
> > > +		return !!(vmcb->control.int_ctl & V_NMI_ENABLE);
> > > +	else
> > > +		return false;
> > > +}
> > > +
> > > +static inline bool is_vnmi_mask_set(struct vcpu_svm *svm)
> > > +{
> > > +	struct vmcb *vmcb = get_vnmi_vmcb(svm);
> > > +
> > > +	if (vmcb)
> > > +		return !!(vmcb->control.int_ctl & V_NMI_MASK);
> > > +	else
> > > +		return false;
> > > +}
> > > +
> > >  /* svm.c */
> > >  #define MSR_INVALID				0xffffffffU
> > >