lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <aKeDuaW5Df7PgA38@google.com>
Date: Thu, 21 Aug 2025 13:38:17 -0700
From: Sean Christopherson <seanjc@...gle.com>
To: "Maciej S. Szmigiero" <mail@...iej.szmigiero.name>
Cc: Paolo Bonzini <pbonzini@...hat.com>, Maxim Levitsky <mlevitsk@...hat.com>, 
	Suravee Suthikulpanit <Suravee.Suthikulpanit@....com>, 
	Alejandro Jimenez <alejandro.j.jimenez@...cle.com>, kvm@...r.kernel.org, 
	linux-kernel@...r.kernel.org
Subject: Re: [PATCH 1/2] KVM: SVM: Sync TPR from LAPIC into VMCB::V_TPR when
 setting LAPIC regs

On Tue, Aug 19, 2025, Maciej S. Szmigiero wrote:
> From: "Maciej S. Szmigiero" <maciej.szmigiero@...cle.com>
> 
> When AVIC is enabled the normal pre-VMRUN sync in sync_lapic_to_cr8() is
> inhibited so any changed TPR in the LAPIC state would not get copied into
> the V_TPR field of VMCB.
> 
> AVIC does sync between these two fields, however it does so only on
> explicit guest writes to one of these fields, not on a bare VMRUN.
> 
> This is especially true when it is the userspace setting LAPIC state via
> KVM_SET_LAPIC ioctl() since userspace does not have access to the guest
> VMCB.
> 
> Practice shows that it is the V_TPR that is actually used by the AVIC to
> decide whether to issue pending interrupts to the CPU (not TPR in TASKPRI),
> so any leftover value in V_TPR will cause serious interrupt delivery issues
> in the guest when AVIC is enabled.
> 
> Fix this issue by explicitly copying LAPIC TPR to VMCB::V_TPR in
> avic_apicv_post_state_restore(), which gets called from KVM_SET_LAPIC and
> similar code paths when AVIC is enabled.
> 
> Fixes: 3bbf3565f48c ("svm: Do not intercept CR8 when enable AVIC")
> Signed-off-by: Maciej S. Szmigiero <maciej.szmigiero@...cle.com>
> ---
>  arch/x86/kvm/svm/avic.c | 23 +++++++++++++++++++++++
>  1 file changed, 23 insertions(+)
> 
> diff --git a/arch/x86/kvm/svm/avic.c b/arch/x86/kvm/svm/avic.c
> index a34c5c3b164e..877bc3db2c6e 100644
> --- a/arch/x86/kvm/svm/avic.c
> +++ b/arch/x86/kvm/svm/avic.c
> @@ -725,8 +725,31 @@ int avic_init_vcpu(struct vcpu_svm *svm)
>  
>  void avic_apicv_post_state_restore(struct kvm_vcpu *vcpu)
>  {
> +	struct vcpu_svm *svm = to_svm(vcpu);
> +	u64 cr8;
> +
>  	avic_handle_dfr_update(vcpu);
>  	avic_handle_ldr_update(vcpu);
> +
> +	/* Running nested should have inhibited AVIC. */
> +	if (WARN_ON_ONCE(nested_svm_virtualize_tpr(vcpu)))
> +		return;


> +
> +	/*
> +	 * Sync TPR from LAPIC TASKPRI into V_TPR field of the VMCB.
> +	 *
> +	 * When AVIC is enabled the normal pre-VMRUN sync in sync_lapic_to_cr8()
> +	 * is inhibited so any set TPR LAPIC state would not get reflected
> +	 * in V_TPR.

Hmm, I think that code is straight up wrong.  There's no justification, just a
claim:

  commit 3bbf3565f48ce3999b5a12cde946f81bd4475312
  Author:     Suravee Suthikulpanit <Suravee.Suthikulpanit@....com>
  AuthorDate: Wed May 4 14:09:51 2016 -0500
  Commit:     Paolo Bonzini <pbonzini@...hat.com>
  CommitDate: Wed May 18 18:04:31 2016 +0200

    svm: Do not intercept CR8 when enable AVIC
    
    When enable AVIC:
        * Do not intercept CR8 since this should be handled by AVIC HW.
        * Also, we don't need to sync cr8/V_TPR and APIC backing page.   <======
    
    Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@....com>
    [Rename svm_in_nested_interrupt_shadow to svm_nested_virtualize_tpr. - Paolo]
    Signed-off-by: Paolo Bonzini <pbonzini@...hat.com>

That claim assumes APIC[TPR] will _never_ be modified by anything other than
hardware.  That's obviously false for state restore from userspace, and it's also
technically false at steady state, e.g. if KVM managed to trigger emulation of a
store to the APIC page, then KVM would bypass the automatic harware sync.

There's also the comically ancient KVM_SET_VAPIC_ADDR, which AFAICT appears to
be largely dead code with respect to vTPR (nothing sets KVM_APIC_CHECK_VAPIC
except for the initial ioctl), but could again set APIC[TPR] without updating
V_TPR.

So, rather than manually do the update during state restore, my vote is to restore
the sync logic.  And if we want to optimize that code (seems unnecessary), then
we should hook all TPR writes.

diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index d9931c6c4bc6..1bfebe40854f 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -4046,8 +4046,7 @@ static inline void sync_lapic_to_cr8(struct kvm_vcpu *vcpu)
        struct vcpu_svm *svm = to_svm(vcpu);
        u64 cr8;
 
-       if (nested_svm_virtualize_tpr(vcpu) ||
-           kvm_vcpu_apicv_active(vcpu))
+       if (nested_svm_virtualize_tpr(vcpu))
                return;
 
        cr8 = kvm_get_cr8(vcpu);


> +	 *
> +	 * Practice shows that it is the V_TPR that is actually used by the
> +	 * AVIC to decide whether to issue pending interrupts to the CPU, not
> +	 * TPR in TASKPRI.

FWIW, the APM kinda sorta alludes to this:

  Enabling AVIC also affects CR8 behavior independent of V_INTR_MASKING enable
  (bit 24): writes to CR8 affect the V_TPR and update the backing page and reads
  from CR8 return V_TPR.


> +	 */
> +	cr8 = kvm_get_cr8(vcpu);
> +	svm->vmcb->control.int_ctl &= ~V_TPR_MASK;
> +	svm->vmcb->control.int_ctl |= cr8 & V_TPR_MASK;
> +	WARN_ON_ONCE(!vmcb_is_dirty(svm->vmcb, VMCB_INTR));


>  }
>  
>  static void svm_ir_list_del(struct kvm_kernel_irqfd *irqfd)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ