[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <aKeDuaW5Df7PgA38@google.com>
Date: Thu, 21 Aug 2025 13:38:17 -0700
From: Sean Christopherson <seanjc@...gle.com>
To: "Maciej S. Szmigiero" <mail@...iej.szmigiero.name>
Cc: Paolo Bonzini <pbonzini@...hat.com>, Maxim Levitsky <mlevitsk@...hat.com>,
Suravee Suthikulpanit <Suravee.Suthikulpanit@....com>,
Alejandro Jimenez <alejandro.j.jimenez@...cle.com>, kvm@...r.kernel.org,
linux-kernel@...r.kernel.org
Subject: Re: [PATCH 1/2] KVM: SVM: Sync TPR from LAPIC into VMCB::V_TPR when
setting LAPIC regs
On Tue, Aug 19, 2025, Maciej S. Szmigiero wrote:
> From: "Maciej S. Szmigiero" <maciej.szmigiero@...cle.com>
>
> When AVIC is enabled the normal pre-VMRUN sync in sync_lapic_to_cr8() is
> inhibited so any changed TPR in the LAPIC state would not get copied into
> the V_TPR field of VMCB.
>
> AVIC does sync between these two fields, however it does so only on
> explicit guest writes to one of these fields, not on a bare VMRUN.
>
> This is especially true when it is the userspace setting LAPIC state via
> KVM_SET_LAPIC ioctl() since userspace does not have access to the guest
> VMCB.
>
> Practice shows that it is the V_TPR that is actually used by the AVIC to
> decide whether to issue pending interrupts to the CPU (not TPR in TASKPRI),
> so any leftover value in V_TPR will cause serious interrupt delivery issues
> in the guest when AVIC is enabled.
>
> Fix this issue by explicitly copying LAPIC TPR to VMCB::V_TPR in
> avic_apicv_post_state_restore(), which gets called from KVM_SET_LAPIC and
> similar code paths when AVIC is enabled.
>
> Fixes: 3bbf3565f48c ("svm: Do not intercept CR8 when enable AVIC")
> Signed-off-by: Maciej S. Szmigiero <maciej.szmigiero@...cle.com>
> ---
> arch/x86/kvm/svm/avic.c | 23 +++++++++++++++++++++++
> 1 file changed, 23 insertions(+)
>
> diff --git a/arch/x86/kvm/svm/avic.c b/arch/x86/kvm/svm/avic.c
> index a34c5c3b164e..877bc3db2c6e 100644
> --- a/arch/x86/kvm/svm/avic.c
> +++ b/arch/x86/kvm/svm/avic.c
> @@ -725,8 +725,31 @@ int avic_init_vcpu(struct vcpu_svm *svm)
>
> void avic_apicv_post_state_restore(struct kvm_vcpu *vcpu)
> {
> + struct vcpu_svm *svm = to_svm(vcpu);
> + u64 cr8;
> +
> avic_handle_dfr_update(vcpu);
> avic_handle_ldr_update(vcpu);
> +
> + /* Running nested should have inhibited AVIC. */
> + if (WARN_ON_ONCE(nested_svm_virtualize_tpr(vcpu)))
> + return;
> +
> + /*
> + * Sync TPR from LAPIC TASKPRI into V_TPR field of the VMCB.
> + *
> + * When AVIC is enabled the normal pre-VMRUN sync in sync_lapic_to_cr8()
> + * is inhibited so any set TPR LAPIC state would not get reflected
> + * in V_TPR.
Hmm, I think that code is straight up wrong. There's no justification, just a
claim:
commit 3bbf3565f48ce3999b5a12cde946f81bd4475312
Author: Suravee Suthikulpanit <Suravee.Suthikulpanit@....com>
AuthorDate: Wed May 4 14:09:51 2016 -0500
Commit: Paolo Bonzini <pbonzini@...hat.com>
CommitDate: Wed May 18 18:04:31 2016 +0200
svm: Do not intercept CR8 when enable AVIC
When enable AVIC:
* Do not intercept CR8 since this should be handled by AVIC HW.
* Also, we don't need to sync cr8/V_TPR and APIC backing page. <======
Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@....com>
[Rename svm_in_nested_interrupt_shadow to svm_nested_virtualize_tpr. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@...hat.com>
That claim assumes APIC[TPR] will _never_ be modified by anything other than
hardware. That's obviously false for state restore from userspace, and it's also
technically false at steady state, e.g. if KVM managed to trigger emulation of a
store to the APIC page, then KVM would bypass the automatic harware sync.
There's also the comically ancient KVM_SET_VAPIC_ADDR, which AFAICT appears to
be largely dead code with respect to vTPR (nothing sets KVM_APIC_CHECK_VAPIC
except for the initial ioctl), but could again set APIC[TPR] without updating
V_TPR.
So, rather than manually do the update during state restore, my vote is to restore
the sync logic. And if we want to optimize that code (seems unnecessary), then
we should hook all TPR writes.
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index d9931c6c4bc6..1bfebe40854f 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -4046,8 +4046,7 @@ static inline void sync_lapic_to_cr8(struct kvm_vcpu *vcpu)
struct vcpu_svm *svm = to_svm(vcpu);
u64 cr8;
- if (nested_svm_virtualize_tpr(vcpu) ||
- kvm_vcpu_apicv_active(vcpu))
+ if (nested_svm_virtualize_tpr(vcpu))
return;
cr8 = kvm_get_cr8(vcpu);
> + *
> + * Practice shows that it is the V_TPR that is actually used by the
> + * AVIC to decide whether to issue pending interrupts to the CPU, not
> + * TPR in TASKPRI.
FWIW, the APM kinda sorta alludes to this:
Enabling AVIC also affects CR8 behavior independent of V_INTR_MASKING enable
(bit 24): writes to CR8 affect the V_TPR and update the backing page and reads
from CR8 return V_TPR.
> + */
> + cr8 = kvm_get_cr8(vcpu);
> + svm->vmcb->control.int_ctl &= ~V_TPR_MASK;
> + svm->vmcb->control.int_ctl |= cr8 & V_TPR_MASK;
> + WARN_ON_ONCE(!vmcb_is_dirty(svm->vmcb, VMCB_INTR));
> }
>
> static void svm_ir_list_del(struct kvm_kernel_irqfd *irqfd)
Powered by blists - more mailing lists