[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <ei6cdmnvhzyavfobamjkcq2ghdrxcv7ruxhcbzzycqlvaty7zr@5cjkfczxiqom>
Date: Fri, 14 Nov 2025 16:52:10 +0000
From: Yosry Ahmed <yosry.ahmed@...ux.dev>
To: Sean Christopherson <seanjc@...gle.com>,
Paolo Bonzini <pbonzini@...hat.com>
Cc: kvm@...r.kernel.org, linux-kernel@...r.kernel.org,
stable@...r.kernel.org
Subject: Re: [PATCH] KVM: SVM: Fix redundant updates of LBR MSR intercepts
On Fri, Nov 14, 2025 at 08:34:54AM -0800, Sean Christopherson wrote:
> On Wed, Nov 12, 2025, Yosry Ahmed wrote:
> > svm_update_lbrv() always updates LBR MSRs intercepts, even when they are
> > already set correctly. This results in force_msr_bitmap_recalc always
> > being set to true on every nested transition,
>
> Nit, it's only on VMRUN, not on every transition (i.e. not on nested #VMEXIT).
How so? svm_update_lbrv() will also be called in nested_svm_vmexit(),
and it will eventually lead to force_msr_bitmap_recalc being set to
true.
I guess what you meant is the "undoing the Hyper-V optimization" part.
That is indeed only affected by the svm_update_lbrv() call in the nested
VMRUN path.
So we do set force_msr_bitmap_recalc on nested #VMEXIT, but it doesn't
really matter because we set it again on nested VMRUN before
nested_svm_merge_msrpm() is called.
>
> > essentially undoing the hyperv optimization in nested_svm_merge_msrpm().
>
> When something fixes a KVM test failures (selftests or KUT), please call that
> out in the changelog. That way when other people encounter the failure, they'll
> get search hits and won't have to waste their time bisecting and/or debugging.
>
> If you hadn't mentioned off-list that this was detected by hyperv_svm_test, I
> wouldn't have had the first clue as to why that test started failing. Even with
> the hint, it still took me a few minutes to connect the dots.
Noted, makes sense. I thought the fix and the original patch are in such
quick succession that hopefully no one will run into it.
>
> In general, be more explicit/detailed, e.g. "undoing the hyperv optimization" is
> unnecessarily vague, as the reader has to go look at the code to understand what
> you're talking about. My philosophy with changelogs is that they are write-once,
> read-many, and so if you can save any time/effort for readers, it's almost always
> worth the extra time/effort on the "write" side.
>
> And a nit: my strong preference is to lead with what is being changed, and then
> dive into the details of why, what's breaking, etc. This is one of the few
> divergences from the tip-tree preferences. From Documentation/process/maintainer-kvm-x86.rst:
>
> Stating what a patch does before diving into details is preferred by KVM x86
> for several reasons. First and foremost, what code is actually being changed
> is arguably the most important information, and so that info should be easy to
> find. Changelogs that bury the "what's actually changing" in a one-liner after
> 3+ paragraphs of background make it very hard to find that information.
Noted.
>
> E.g.
>
> --
> Don't update the LBR MSR intercept bitmaps if they're already up-to-date,
> as unconditionally updating the intercepts forces KVM to recalculate the
> MSR bitmaps for vmcb02 on every nested VMRUN. Functionally, the redundant
> updates are benign, but forcing an update neuters the Hyper-V optimization
> that allows KVM to skip refreshing the vmcb12 MSR bitmap if L1 marked the
> "nested enlightenments" as being clean, i.e. if L1 told KVM that no
> changes were made to the MSR bitmap since the last VMRUN.
>
> Clobbering the Hyper-V optimization manifests as a failure in the
> hyperv_svm_test KVM selftest, which intentionally changes the MSR bitmap
> "without telling KVM about it" to verify that KVM honors the clean hint.
>
> ==== Test Assertion Failure ====
> x86/hyperv_svm_test.c:120: vmcb->control.exit_code == 0x081
> pid=193558 tid=193558 errno=4 - Interrupted system call
> 1 0x0000000000411361: assert_on_unhandled_exception at processor.c:659
> 2 0x0000000000406186: _vcpu_run at kvm_util.c:1699
> 3 (inlined by) vcpu_run at kvm_util.c:1710
> 4 0x0000000000401f2a: main at hyperv_svm_test.c:175
> 5 0x000000000041d0d3: __libc_start_call_main at libc-start.o:?
> 6 0x000000000041f27c: __libc_start_main_impl at ??:?
> 7 0x00000000004021a0: _start at ??:?
> vmcb->control.exit_code == SVM_EXIT_VMMCALL
>
> Avoid using ....
> --
Thanks!
>
> > Fix it by keeping track of whether LBR MSRs are intercepted or not and
> > only doing the update if needed, similar to x2avic_msrs_intercepted.
> >
> > Avoid using svm_test_msr_bitmap_*() to check the status of the
> > intercepts, as an arbitrary MSR will need to be chosen as a
> > representative of all LBR MSRs, and this could theoretically break if
> > some of the MSRs intercepts are handled differently from the rest.
>
> For posterity, Yosry originally proposed (off-list) fixing this by having
> svm_set_intercept_for_msr() check for redundant updates, but I voted against
> that because updating MSR interception _should_ be rare (full CPUID updates and
> explicit MSR filter updates), and I don't want to risk hiding a bug/flaw elsewhere.
> I.e. if something is triggering frequent/unexpected MSR bitmap changes, I want
> that to be surfaced, not squashed/handled by the low level helpers.
Hmm, that was on-list though :P
https://lore.kernel.org/kvm/aRO5ItX_--ZDfnfM@google.com/
>
>
> > Also, using svm_test_msr_bitmap_*() makes backports difficult as it was
> > only recently introduced with no direct alternatives in older kernels.
> >
> > Fixes: fbe5e5f030c2 ("KVM: nSVM: Always recalculate LBR MSR intercepts in svm_update_lbrv()")
> > Cc: stable@...r.kernel.org
> > Signed-off-by: Yosry Ahmed <yosry.ahmed@...ux.dev>
>
> With an updated changelog,
>
> Reviewed-by: Sean Christopherson <seanjc@...gle.com>
> Tested-by: Sean Christopherson <seanjc@...gle.com>
Thanks!
Paolo, do you prefer a updated patch with the updated changelog, or
fixing it up when you apply it?
Powered by blists - more mailing lists