[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ZGebCSwAA4W10atN@google.com>
Date: Fri, 19 May 2023 08:51:37 -0700
From: Sean Christopherson <seanjc@...gle.com>
To: "Maciej S. Szmigiero" <mail@...iej.szmigiero.name>
Cc: Paolo Bonzini <pbonzini@...hat.com>,
Maxim Levitsky <mlevitsk@...hat.com>,
Santosh Shukla <santosh.shukla@....com>, vkuznets@...hat.com,
jmattson@...gle.com, thomas.lendacky@....com, kvm@...r.kernel.org,
linux-kernel@...r.kernel.org
Subject: Re: [PATCH] KVM: SVM: vNMI pending bit is V_NMI_PENDING_MASK not V_NMI_BLOCKING_MASK
On Fri, May 19, 2023, Maciej S. Szmigiero wrote:
> From: "Maciej S. Szmigiero" <maciej.szmigiero@...cle.com>
>
> While testing Hyper-V enabled Windows Server 2019 guests on Zen4 hardware
> I noticed that with vCPU count large enough (> 16) they sometimes froze at
> boot.
> With vCPU count of 64 they never booted successfully - suggesting some kind
> of a race condition.
>
> Since adding "vnmi=0" module parameter made these guests boot successfully
> it was clear that the problem is most likely (v)NMI-related.
>
> Running kvm-unit-tests quickly showed failing NMI-related tests cases, like
> "multiple nmi" and "pending nmi" from apic-split, x2apic and xapic tests
> and the NMI parts of eventinj test.
>
> The issue was that once one NMI was being serviced no other NMI was allowed
> to be set pending (NMI limit = 0), which was traced to
> svm_is_vnmi_pending() wrongly testing for the "NMI blocked" flag rather
> than for the "NMI pending" flag.
>
> Fix this by testing for the right flag in svm_is_vnmi_pending().
> Once this is done, the NMI-related kvm-unit-tests pass successfully and
> the Windows guest no longer freezes at boot.
>
> Fixes: fa4c027a7956 ("KVM: x86: Add support for SVM's Virtual NMI")
> Signed-off-by: Maciej S. Szmigiero <maciej.szmigiero@...cle.com>
Reviewed-by: Sean Christopherson <seanjc@...gle.com>
> ---
>
> It's a bit sad that no-one apparently tested the vNMI patchset with
> kvm-unit-tests on an actual vNMI-enabled hardware...
That's one way to put it.
Santosh, what happened? This goof was present in both v3 and v4, i.e. it wasn't
something that we botched when applying/massaging at the last minute. And the
cover letters for both v3 and v4 state "Series ... tested on AMD EPYC-Genoa".
Powered by blists - more mailing lists