linux-kernel - Re: [PATCH v2 09/24] KVM: nVMX: Unconditionally clear mtf

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <261f60dfe9813e612bfb30b6f271a68b9e877408.camel@redhat.com>
Date:   Mon, 18 Jul 2022 16:04:19 +0300
From:   Maxim Levitsky <mlevitsk@...hat.com>
To:     Sean Christopherson <seanjc@...gle.com>,
        Paolo Bonzini <pbonzini@...hat.com>
Cc:     kvm@...r.kernel.org, linux-kernel@...r.kernel.org,
        Jim Mattson <jmattson@...gle.com>,
        Oliver Upton <oupton@...gle.com>,
        Peter Shier <pshier@...gle.com>
Subject: Re: [PATCH v2 09/24] KVM: nVMX: Unconditionally clear mtf_pending
 on nested VM-Exit

On Fri, 2022-07-15 at 20:42 +0000, Sean Christopherson wrote:
> Clear mtf_pending on nested VM-Exit instead of handling the clear on a
> case-by-case basis in vmx_check_nested_events().  The pending MTF should
> never survive nested VM-Exit, as it is a property of KVM's run of the
> current L2, i.e. should never affect the next L2 run by L1.  In practice,
> this is likely a nop as getting to L1 with nested_run_pending is
> impossible, and KVM doesn't correctly handle morphing a pending exception
> that occurs on a prior injected exception (need for re-injected exception
> being the other case where MTF isn't cleared).  However, KVM will
> hopefully soon correctly deal with a pending exception on top of an
> injected exception.
> 
> Add a TODO to document that KVM has an inversion priority bug between
> SMIs and MTF (and trap-like #DBS), and that KVM also doesn't properly
> save/restore MTF across SMI/RSM.
> 
> Signed-off-by: Sean Christopherson <seanjc@...gle.com>
> Reviewed-by: Maxim Levitsky <mlevitsk@...hat.com>
> ---
>  arch/x86/kvm/vmx/nested.c | 21 ++++++++++++---------
>  1 file changed, 12 insertions(+), 9 deletions(-)
> 
> diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
> index 104f233ddd5d..a85f31cee149 100644
> --- a/arch/x86/kvm/vmx/nested.c
> +++ b/arch/x86/kvm/vmx/nested.c
> @@ -3898,16 +3898,8 @@ static int vmx_check_nested_events(struct kvm_vcpu *vcpu)
>         unsigned long exit_qual;
>         bool block_nested_events =
>             vmx->nested.nested_run_pending || kvm_event_needs_reinjection(vcpu);
> -       bool mtf_pending = vmx->nested.mtf_pending;
>         struct kvm_lapic *apic = vcpu->arch.apic;
>  
> -       /*
> -        * Clear the MTF state. If a higher priority VM-exit is delivered first,
> -        * this state is discarded.
> -        */
> -       if (!block_nested_events)
> -               vmx->nested.mtf_pending = false;
> -
>         if (lapic_in_kernel(vcpu) &&
>                 test_bit(KVM_APIC_INIT, &apic->pending_events)) {
>                 if (block_nested_events)
> @@ -3916,6 +3908,9 @@ static int vmx_check_nested_events(struct kvm_vcpu *vcpu)
>                 clear_bit(KVM_APIC_INIT, &apic->pending_events);
>                 if (vcpu->arch.mp_state != KVM_MP_STATE_INIT_RECEIVED)
>                         nested_vmx_vmexit(vcpu, EXIT_REASON_INIT_SIGNAL, 0, 0);
> +
> +               /* MTF is discarded if the vCPU is in WFS. */
> +               vmx->nested.mtf_pending = false;
>                 return 0;
>         }
>  
> @@ -3938,6 +3933,11 @@ static int vmx_check_nested_events(struct kvm_vcpu *vcpu)
>          * fault-like exceptions, TSS T flag #DB (not emulated by KVM, but
>          * could theoretically come in from userspace), and ICEBP (INT1).
>          *
> +        * TODO: SMIs have higher priority than MTF and trap-like #DBs (except
> +        * for TSS T flag #DBs).  KVM also doesn't save/restore pending MTF
> +        * across SMI/RSM as it should; that needs to be addressed in order to
> +        * prioritize SMI over MTF and trap-like #DBs.

Thanks, makes sense.

Best regards,
	Maxim Levitsky
> +        *
>          * Note that only a pending nested run can block a pending exception.
>          * Otherwise an injected NMI/interrupt should either be
>          * lost or delivered to the nested hypervisor in the IDT_VECTORING_INFO,
> @@ -3953,7 +3953,7 @@ static int vmx_check_nested_events(struct kvm_vcpu *vcpu)
>                 return 0;
>         }
>  
> -       if (mtf_pending) {
> +       if (vmx->nested.mtf_pending) {
>                 if (block_nested_events)
>                         return -EBUSY;
>                 nested_vmx_update_pending_dbg(vcpu);
> @@ -4549,6 +4549,9 @@ void nested_vmx_vmexit(struct kvm_vcpu *vcpu, u32 vm_exit_reason,
>         struct vcpu_vmx *vmx = to_vmx(vcpu);
>         struct vmcs12 *vmcs12 = get_vmcs12(vcpu);
>  
> +       /* Pending MTF traps are discarded on VM-Exit. */
> +       vmx->nested.mtf_pending = false;
> +
>         /* trying to cancel vmlaunch/vmresume is a bug */
>         WARN_ON_ONCE(vmx->nested.nested_run_pending);
>