lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <72da0532-908b-40c2-a4e4-7ef1895547c7@oracle.com>
Date: Wed, 12 Nov 2025 19:06:24 -0800
From: Dongli Zhang <dongli.zhang@...cle.com>
To: Sean Christopherson <seanjc@...gle.com>
Cc: kvm@...r.kernel.org, x86@...nel.org, linux-kernel@...r.kernel.org,
        chao.gao@...el.com, pbonzini@...hat.com, tglx@...utronix.de,
        mingo@...hat.com, bp@...en8.de, dave.hansen@...ux.intel.com,
        hpa@...or.com, joe.jin@...cle.com, alejandro.j.jimenez@...cle.com
Subject: Re: [PATCH v2 1/1] KVM: VMX: configure SVI during runtime APICv
 activation

Hi Sean,

On 11/12/25 6:47 AM, Sean Christopherson wrote:
> On Sun, Nov 09, 2025, Dongli Zhang wrote:
>> ---
>> Changed since v2:
>>   - Add support for guest mode (suggested by Chao Gao).
>>   - Add comments in the code (suggested by Chao Gao).
>>   - Remove WARN_ON_ONCE from vmx_hwapic_isr_update().
>>   - Edit commit message "AMD SVM APICv" to "AMD SVM AVIC"
>>     (suggested by Alejandro Jimenez).
>>
>>  arch/x86/kvm/vmx/vmx.c | 9 ---------
>>  arch/x86/kvm/x86.c     | 7 +++++++
>>  2 files changed, 7 insertions(+), 9 deletions(-)
>>
>> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
>> index f87c216d976d..d263dbf0b917 100644
>> --- a/arch/x86/kvm/vmx/vmx.c
>> +++ b/arch/x86/kvm/vmx/vmx.c
>> @@ -6878,15 +6878,6 @@ void vmx_hwapic_isr_update(struct kvm_vcpu *vcpu, int max_isr)
>>  	 * VM-Exit, otherwise L1 with run with a stale SVI.
>>  	 */
>>  	if (is_guest_mode(vcpu)) {
>> -		/*
>> -		 * KVM is supposed to forward intercepted L2 EOIs to L1 if VID
>> -		 * is enabled in vmcs12; as above, the EOIs affect L2's vAPIC.
>> -		 * Note, userspace can stuff state while L2 is active; assert
>> -		 * that VID is disabled if and only if the vCPU is in KVM_RUN
>> -		 * to avoid false positives if userspace is setting APIC state.
>> -		 */
>> -		WARN_ON_ONCE(vcpu->wants_to_run &&
>> -			     nested_cpu_has_vid(get_vmcs12(vcpu)));
>>  		to_vmx(vcpu)->nested.update_vmcs01_hwapic_isr = true;
>>  		return;
>>  	}
>> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
>> index b4b5d2d09634..08b34431c187 100644
>> --- a/arch/x86/kvm/x86.c
>> +++ b/arch/x86/kvm/x86.c
>> @@ -10878,9 +10878,16 @@ void __kvm_vcpu_update_apicv(struct kvm_vcpu *vcpu)
>>  	 * pending. At the same time, KVM_REQ_EVENT may not be set as APICv was
>>  	 * still active when the interrupt got accepted. Make sure
>>  	 * kvm_check_and_inject_events() is called to check for that.
>> +	 *
>> +	 * When APICv gets enabled, updating SVI is necessary; otherwise,
>> +	 * SVI won't reflect the highest bit in vISR and the next EOI from
>> +	 * the guest won't be virtualized correctly, as the CPU will clear
>> +	 * the SVI bit from vISR.
>>  	 */
>>  	if (!apic->apicv_active)
>>  		kvm_make_request(KVM_REQ_EVENT, vcpu);
>> +	else
>> +		kvm_apic_update_hwapic_isr(vcpu);
> 
> Rather than trigger the update from x86.c, what if we let VMX make the call?
> Then we don't need to drop the WARN, and in the unlikely scenario L2 is active,
> we'll save a pointless scan of the vISR (VMX will defer the update until L1 is
> active).
> 
> We could even have kvm_apic_update_hwapic_isr() WARN if L2 is active.  E.g. with
> an opportunistic typo fix in vmx_hwapic_isr_update()'s comment (completely untested):
> 
> diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
> index 0ae7f913d782..786ccfc24252 100644
> --- a/arch/x86/kvm/lapic.c
> +++ b/arch/x86/kvm/lapic.c
> @@ -774,7 +774,8 @@ void kvm_apic_update_hwapic_isr(struct kvm_vcpu *vcpu)
>  {
>         struct kvm_lapic *apic = vcpu->arch.apic;
>  
> -       if (WARN_ON_ONCE(!lapic_in_kernel(vcpu)) || !apic->apicv_active)
> +       if (WARN_ON_ONCE(!lapic_in_kernel(vcpu)) || !apic->apicv_active ||
> +                        is_guest_mode(vcpu))
>                 return;
>  
>         kvm_x86_call(hwapic_isr_update)(vcpu, apic_find_highest_isr(apic));
> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> index 91b6f2f3edc2..653b8b713547 100644
> --- a/arch/x86/kvm/vmx/vmx.c
> +++ b/arch/x86/kvm/vmx/vmx.c
> @@ -4430,6 +4430,14 @@ void vmx_refresh_apicv_exec_ctrl(struct kvm_vcpu *vcpu)
>                                                  kvm_vcpu_apicv_active(vcpu));
>  
>         vmx_update_msr_bitmap_x2apic(vcpu);
> +
> +       /*
> +        * Refresh SVI if APICv is enabled, as any changes KVM made to vISR
> +        * while APICv was disabled need to be reflected in SVI, e.g. so that
> +        * the next accelerated EOI will clear the correct vector in vISR.
> +        */
> +       if (kvm_vcpu_apicv_active(vcpu))
> +               kvm_apic_update_hwapic_isr(vcpu);
>  }
>  
>  static u32 vmx_exec_control(struct vcpu_vmx *vmx)
> @@ -6880,7 +6888,7 @@ void vmx_hwapic_isr_update(struct kvm_vcpu *vcpu, int max_isr)
>  
>         /*
>          * If L2 is active, defer the SVI update until vmcs01 is loaded, as SVI
> -        * is only relevant for if and only if Virtual Interrupt Delivery is
> +        * is only relevant for L2 if and only if Virtual Interrupt Delivery is
>          * enabled in vmcs12, and if VID is enabled then L2 EOIs affect L2's
>          * vAPIC, not L1's vAPIC.  KVM must update vmcs01 on the next nested
>          * VM-Exit, otherwise L1 with run with a stale SVI.


As a quick reply, the idea is to call kvm_apic_update_hwapic_isr() in
vmx_refresh_apicv_exec_ctrl(), instead of __kvm_vcpu_update_apicv().

I think the below case doesn't work:

1. APICv is activated when vCPU is in L2.

kvm_vcpu_update_apicv()
-> __kvm_vcpu_update_apicv()
   -> vmx_refresh_apicv_exec_ctrl()

vmx_refresh_apicv_exec_ctrl() returns after setting:
vmx->nested.update_vmcs01_apicv_status = true.


2. On exit from L2 to L1, __nested_vmx_vmexit() requests for KVM_REQ_APICV_UPDATE.

__nested_vmx_vmexit()
-> leave_guest_mode(vcpu)
-> kvm_make_request(KVM_REQ_APICV_UPDATE, vcpu)


3. vCPU processes KVM_REQ_APICV_UPDATE again.

This time, __kvm_vcpu_update_apicv() returns without calling
refresh_apicv_exec_ctrl(), because (apic->apicv_active == activate).

vmx_refresh_apicv_exec_ctrl() doesn't get any chance to be called.


In order to call kvm_apic_update_hwapic_isr() in vmx_refresh_apicv_exec_ctrl(),
we may need to resolve the issue mentioned by Chao, for instance, with something
like:

diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index bcea087b642f..1725c6a94f99 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -19,6 +19,7 @@
 #include "trace.h"
 #include "vmx.h"
 #include "smm.h"
+#include "x86_ops.h"

 static bool __read_mostly enable_shadow_vmcs = 1;
 module_param_named(enable_shadow_vmcs, enable_shadow_vmcs, bool, S_IRUGO);
@@ -5216,7 +5217,7 @@ void __nested_vmx_vmexit(struct kvm_vcpu *vcpu, u32
vm_exit_reason,

        if (vmx->nested.update_vmcs01_apicv_status) {
                vmx->nested.update_vmcs01_apicv_status = false;
-               kvm_make_request(KVM_REQ_APICV_UPDATE, vcpu);
+               vmx_refresh_apicv_exec_ctrl(vcpu);
        }

        if (vmx->nested.update_vmcs01_hwapic_isr) {

Still validating if it works well.

Thank you very much!

Dongli Zhang

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ