linux-kernel - Re: About patch bdedff263132 - KVM: x86: Route pending NMIs

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [thread-next>] [day] [month] [year] [list]

Message-ID: <ZT_HeK7GXdY-6L3t@google.com>
Date:   Mon, 30 Oct 2023 15:10:48 +0000
From:   Sean Christopherson <seanjc@...gle.com>
To:     Prasad Pandit <ppandit@...hat.com>
Cc:     kvm@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: About patch bdedff263132 - KVM: x86: Route pending NMIs

+KVM and LKML

https://people.kernel.org/tglx/notes-about-netiquette

On Mon, Oct 30, 2023, Prasad Pandit wrote:
> Hello Sean,
> 
> Please see:
>     -> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=bdedff263132c862924f5cad96f0e82eeeb4e2e6
> 
> * While testing a real-time host/guest setup, the above patch is
> causing a strange regression wherien guest boot delays by indefinite
> time. Sometimes it boots within a minute, sometimes it takes much
> longer. Maybe the guest VM is waiting for a NMI event.
> 
> * Reverting the above patch helps to fix this issue. I'm wondering if
> a fix patch like below would be acceptable OR reverting above patch is
> more reasonable?

No, a revert would break AMD's vNMI.

> ===
> # cat ~test/rpmbuild/SOURCES/linux-kernel-test.patch
> +++ linux-5.14.0-372.el9/arch/x86/kvm/x86.c     2023-10-30
> 09:05:05.172815973 -0400
> @@ -5277,7 +5277,8 @@ static int kvm_vcpu_ioctl_x86_set_vcpu_e
>         if (events->flags & KVM_VCPUEVENT_VALID_NMI_PENDING) {
>                 vcpu->arch.nmi_pending = 0;
>                 atomic_set(&vcpu->arch.nmi_queued, events->nmi.pending);
> -               kvm_make_request(KVM_REQ_NMI, vcpu);
> +               if (events->nmi.pending)
> +                       kvm_make_request(KVM_REQ_NMI, vcpu);

This looks sane, but it should be unnecessary as KVM_REQ_NMI nmi_queued=0 should
be a (costly) nop.  Hrm, unless the vCPU is in HLT, in which case KVM will treat
a spurious KVM_REQ_NMI as a wake event.  When I made this change, my assumption
was that userspace would set KVM_VCPUEVENT_VALID_NMI_PENDING iff there was
relevant information to process.  But if I'm reading the code correctly, QEMU
invokes KVM_SET_VCPU_EVENTS with KVM_VCPUEVENT_VALID_NMI_PENDING at the end of
machine creation.

Hmm, but even that should be benign unless userspace is stuffing other guest
state.  E.g. KVM will spuriously exit to userspace with -EAGAIN while the vCPU
is in KVM_MP_STATE_UNINITIALIZED, and I don't see a way for the vCPU to be put
into a blocking state after transitioning out of UNINITIATED via INIT+SIPI without
processing KVM_REQ_NMI.

>         }
>         static_call(kvm_x86_set_nmi_mask)(vcpu, events->nmi.masked);
> ===
> 
> * Could you please have a look and suggest what could be a better fix?

Please provide more information on what is breaking and/or how to reproduce the
issue.  E.g. at the very least, a trace of KVM_{G,S}ET_VCPU_EVENTS.   There's not
even enough info here to write a changelog.