[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <aTJAVx7C3vuPDgkm@google.com>
Date: Thu, 4 Dec 2025 18:15:51 -0800
From: Sean Christopherson <seanjc@...gle.com>
To: Dongli Zhang <dongli.zhang@...cle.com>
Cc: kvm@...r.kernel.org, x86@...nel.org, linux-kernel@...r.kernel.org,
chao.gao@...el.com, pbonzini@...hat.com, tglx@...utronix.de, mingo@...hat.com,
bp@...en8.de, dave.hansen@...ux.intel.com, hpa@...or.com, joe.jin@...cle.com,
alejandro.j.jimenez@...cle.com
Subject: Re: [PATCH v2 1/1] KVM: VMX: configure SVI during runtime APICv activation
On Mon, Nov 17, 2025, Dongli Zhang wrote:
> > Hmm, what if we go the opposite direction and bundle the vISR update into
> > KVM_REQ_APICV_UPDATE? Then we can drop nested.update_vmcs01_hwapic_isr, and
> > hopefully avoid similar ordering issues in the future.
...
> Thank you very much for suggestion.
If only it was a good suggestion :-)
> There are still a few issues to fix.
>
> 1. We still need to remove WARN_ON_ONCE() from vmx_hwapic_isr_update().
..
> 2. As you mentioned in prior email, while this is not a functional issue,
> apic_find_highest_isr() is still invoked unconditionally, as
> kvm_apic_update_hwapic_isr() is always called during KVM_REQ_APICV_UPDATE.
>
>
> 3. The issue that Chao reminded is still present.
>
> (1) Suppose APICv is activated during L2.
>
> kvm_vcpu_update_apicv()
> -> __kvm_vcpu_update_apicv()
> -> apic->apicv_active = true
> -> vmx_refresh_apicv_exec_ctrl()
> -> vmx->nested.update_vmcs01_apicv_status = true
> -> return
>
> Then L2 exits to L1:
>
> __nested_vmx_vmexit()
> -> kvm_make_request(KVM_REQ_APICV_UPDATE)
>
> vcpu_enter_guest: KVM_REQ_APICV_UPDATE
> -> kvm_vcpu_update_apicv()
> -> __kvm_vcpu_update_apicv()
> -> return because of
> if (apic->apicv_active == activate)
>
> refresh_apicv_exec_ctrl() is skipped.
>
> 4. It looks more complicated if we update "update_vmcs01_apicv_status = true" at
> both vmx_hwapic_isr_update() and vmx_refresh_apicv_exec_ctrl().
>
>
> Therefore, how about we continue to handle 'update_vmcs01_apicv_status' and
> 'update_vmcs01_hwapic_isr' as independent operations.
>
> 1. Take the approach reviewed by Chao, and ...
Ya. I spent a lot of time today wrapping my head around what all is going on,
and my lightbulb moment came when reading this from the changelog:
The issue is not applicable to AMD SVM which employs a different LAPIC
virtualization mechanism.
That's not entirely true. It's definitely true for SVI, but not for the bug that
Chao pointed out. SVM is "immune" from these bugs because KVM simply updates
vmcb01 directly. And looking at everything, there's zero reason we can't do the
same for VMX. Yeah, KVM needs to do a couple VMPTRLDs to swap between vmcs01 and
vmcs02, but those aren't _that_ expensive, and these are slow paths.
And with a guard(), it's pretty trivial to run a section of code with vmcs01
active.
static void vmx_load_vmcs01(struct kvm_vcpu *vcpu)
{
struct vcpu_vmx *vmx = to_vmx(vcpu);
if (!is_guest_mode(vcpu)) {
WARN_ON_ONCE(vmx->loaded_vmcs != &vmx->vmcs01);
return;
}
WARN_ON_ONCE(vmx->loaded_vmcs != &vmx->nested.vmcs02);
vmx_switch_loaded_vmcs(vcpu, &vmx->vmcs01);
}
static void vmx_put_vmcs01(struct kvm_vcpu *vcpu)
{
if (!is_guest_mode(vcpu))
return;
vmx_switch_loaded_vmcs(vcpu, &to_vmx(vcpu)->nested.vmcs02);
}
DEFINE_GUARD(vmx_vmcs01, struct kvm_vcpu *,
vmx_load_vmcs01(_T), vmx_put_vmcs01(_T))
I've got changes to convert everything to guard(vmx_vmcs01); except for
vmx_set_virtual_apic_mode(), they're all quite trivial. I also have a selftest
that hits both this bug and the one Chao pointed out, so I'm reasonably confident
the changes do indeed work.
But they're most definitely NOT stable material. So my plan is to grab this
and the below for 6.19, and then do the cleanup for 6.20 or later.
Oh, almost forgot. We can also sink the hwapic_isr_update() call into
kvm_apic_update_apicv() and drop kvm_apic_update_hwapic_isr() entirely, which is
another argument for your approach. That's actually a really good fit, because
that's where KVM parses the vISR when APICv is being _disabled_.
I'll post a v3 with everything tomorrow (hopefully) after running the changes
through more normal test flow.
> 2. Fix the vmx_refresh_apicv_exec_ctrl() issue with an additional patch:
>
> diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
> index bcea087b642f..7d98c11a8920 100644
> --- a/arch/x86/kvm/vmx/nested.c
> +++ b/arch/x86/kvm/vmx/nested.c
> @@ -19,6 +19,7 @@
> #include "trace.h"
> #include "vmx.h"
> #include "smm.h"
> +#include "x86_ops.h"
>
> static bool __read_mostly enable_shadow_vmcs = 1;
> module_param_named(enable_shadow_vmcs, enable_shadow_vmcs, bool, S_IRUGO);
> @@ -5214,9 +5215,9 @@ void __nested_vmx_vmexit(struct kvm_vcpu *vcpu, u32
> vm_exit_reason,
> kvm_make_request(KVM_REQ_APIC_PAGE_RELOAD, vcpu);
> }
>
> - if (vmx->nested.update_vmcs01_apicv_status) {
> - vmx->nested.update_vmcs01_apicv_status = false;
> - kvm_make_request(KVM_REQ_APICV_UPDATE, vcpu);
> + if (vmx->nested.update_vmcs01_apicv_exec_ctrl) {
> + vmx->nested.update_vmcs01_apicv_exec_ctrl = false;
> + vmx_refresh_apicv_exec_ctrl(vcpu);
+1 to the fix, but I'll omit the update_vmcs01_apicv_exec_ctrl rename because I'm
99% certain we can get rid of it entirely.
Oh, and can you give your SoB for this? I'll write the changelog, just need the
SoB for the commit. Thanks!
> }
>
> if (vmx->nested.update_vmcs01_hwapic_isr) {
> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> index c3b9eb72b6f3..83705a6d5a8a 100644
> --- a/arch/x86/kvm/vmx/vmx.c
> +++ b/arch/x86/kvm/vmx/vmx.c
> @@ -4415,7 +4415,7 @@ void vmx_refresh_apicv_exec_ctrl(struct kvm_vcpu *vcpu)
> struct vcpu_vmx *vmx = to_vmx(vcpu);
>
> if (is_guest_mode(vcpu)) {
> - vmx->nested.update_vmcs01_apicv_status = true;
> + vmx->nested.update_vmcs01_apicv_exec_ctrl = true;
> return;
> }
>
> diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h
> index ea93121029f9..f6bee0e132a8 100644
> --- a/arch/x86/kvm/vmx/vmx.h
> +++ b/arch/x86/kvm/vmx/vmx.h
> @@ -134,7 +134,7 @@ struct nested_vmx {
> bool change_vmcs01_virtual_apic_mode;
> bool reload_vmcs01_apic_access_page;
> bool update_vmcs01_cpu_dirty_logging;
> - bool update_vmcs01_apicv_status;
> + bool update_vmcs01_apicv_exec_ctrl;
> bool update_vmcs01_hwapic_isr;
>
> /*
>
>
>
> By the way, while reviewing source code, I noticed that certain read accesses to
> 'apicv_inhibit_reasons' are not protected by 'apicv_update_lock'.
Those are fine (hopefully; we spent a stupid amount of time sorting out the ordering).
In all cases, a false positive/negative will be remedied before KVM really truly
consumes the result.
Powered by blists - more mailing lists