[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <zx4aiu65mmk72mo2kooj52q4k3vsp43znlrdadajivsw6ns7ou@7xtzfms3de66>
Date: Fri, 22 Aug 2025 14:34:08 +0530
From: Naveen N Rao <naveen@...nel.org>
To: Sean Christopherson <seanjc@...gle.com>
Cc: "Maciej S. Szmigiero" <mail@...iej.szmigiero.name>,
Paolo Bonzini <pbonzini@...hat.com>, Maxim Levitsky <mlevitsk@...hat.com>,
Suravee Suthikulpanit <Suravee.Suthikulpanit@....com>, Alejandro Jimenez <alejandro.j.jimenez@...cle.com>,
kvm@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH 1/2] KVM: SVM: Sync TPR from LAPIC into VMCB::V_TPR when
setting LAPIC regs
On Thu, Aug 21, 2025 at 01:38:17PM -0700, Sean Christopherson wrote:
> On Tue, Aug 19, 2025, Maciej S. Szmigiero wrote:
> > From: "Maciej S. Szmigiero" <maciej.szmigiero@...cle.com>
> >
> > When AVIC is enabled the normal pre-VMRUN sync in sync_lapic_to_cr8() is
> > inhibited so any changed TPR in the LAPIC state would not get copied into
> > the V_TPR field of VMCB.
> >
> > AVIC does sync between these two fields, however it does so only on
> > explicit guest writes to one of these fields, not on a bare VMRUN.
> >
> > This is especially true when it is the userspace setting LAPIC state via
> > KVM_SET_LAPIC ioctl() since userspace does not have access to the guest
> > VMCB.
> >
> > Practice shows that it is the V_TPR that is actually used by the AVIC to
> > decide whether to issue pending interrupts to the CPU (not TPR in TASKPRI),
> > so any leftover value in V_TPR will cause serious interrupt delivery issues
> > in the guest when AVIC is enabled.
> >
> > Fix this issue by explicitly copying LAPIC TPR to VMCB::V_TPR in
> > avic_apicv_post_state_restore(), which gets called from KVM_SET_LAPIC and
> > similar code paths when AVIC is enabled.
> >
> > Fixes: 3bbf3565f48c ("svm: Do not intercept CR8 when enable AVIC")
> > Signed-off-by: Maciej S. Szmigiero <maciej.szmigiero@...cle.com>
> > ---
> > arch/x86/kvm/svm/avic.c | 23 +++++++++++++++++++++++
> > 1 file changed, 23 insertions(+)
> >
> > diff --git a/arch/x86/kvm/svm/avic.c b/arch/x86/kvm/svm/avic.c
> > index a34c5c3b164e..877bc3db2c6e 100644
> > --- a/arch/x86/kvm/svm/avic.c
> > +++ b/arch/x86/kvm/svm/avic.c
> > @@ -725,8 +725,31 @@ int avic_init_vcpu(struct vcpu_svm *svm)
> >
> > void avic_apicv_post_state_restore(struct kvm_vcpu *vcpu)
> > {
> > + struct vcpu_svm *svm = to_svm(vcpu);
> > + u64 cr8;
> > +
> > avic_handle_dfr_update(vcpu);
> > avic_handle_ldr_update(vcpu);
> > +
> > + /* Running nested should have inhibited AVIC. */
> > + if (WARN_ON_ONCE(nested_svm_virtualize_tpr(vcpu)))
> > + return;
>
>
> > +
> > + /*
> > + * Sync TPR from LAPIC TASKPRI into V_TPR field of the VMCB.
> > + *
> > + * When AVIC is enabled the normal pre-VMRUN sync in sync_lapic_to_cr8()
> > + * is inhibited so any set TPR LAPIC state would not get reflected
> > + * in V_TPR.
>
> Hmm, I think that code is straight up wrong. There's no justification, just a
> claim:
>
> commit 3bbf3565f48ce3999b5a12cde946f81bd4475312
> Author: Suravee Suthikulpanit <Suravee.Suthikulpanit@....com>
> AuthorDate: Wed May 4 14:09:51 2016 -0500
> Commit: Paolo Bonzini <pbonzini@...hat.com>
> CommitDate: Wed May 18 18:04:31 2016 +0200
>
> svm: Do not intercept CR8 when enable AVIC
>
> When enable AVIC:
> * Do not intercept CR8 since this should be handled by AVIC HW.
> * Also, we don't need to sync cr8/V_TPR and APIC backing page. <======
>
> Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@....com>
> [Rename svm_in_nested_interrupt_shadow to svm_nested_virtualize_tpr. - Paolo]
> Signed-off-by: Paolo Bonzini <pbonzini@...hat.com>
>
> That claim assumes APIC[TPR] will _never_ be modified by anything other than
> hardware.
It also isn't clear to me why only sync_lapic_to_cr8() was gated when
AVIC was enabled, while sync_cr8_to_lapic() continues to copy V_TRP to
the backing page. If AVIC is enabled, then the AVIC hardware updates
both the backing page and V_TPR on a guest write to TPR.
> That's obviously false for state restore from userspace, and it's also
> technically false at steady state, e.g. if KVM managed to trigger emulation of a
> store to the APIC page, then KVM would bypass the automatic harware sync.
Do you mean emulation due to AVIC being inhibited? I initially thought
this could be a problem, but in this scenario, AVIC would be disabled on
the next VMRUN, so we will end up sync'ing TPR from the lapic to V_TPR.
>
> There's also the comically ancient KVM_SET_VAPIC_ADDR, which AFAICT appears to
> be largely dead code with respect to vTPR (nothing sets KVM_APIC_CHECK_VAPIC
> except for the initial ioctl), but could again set APIC[TPR] without updating
> V_TPR.
>
> So, rather than manually do the update during state restore, my vote
> is to restore the sync logic. And if we want to optimize that code
> (seems unnecessary), then we should hook all TPR writes.
I guess you mean apic_set_tpr()? We will need to hook into that in
addition to updating avic_apicv_post_state_restore() since KVM_SET_LAPIC
just does a memcpy of the register state.
>
> diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> index d9931c6c4bc6..1bfebe40854f 100644
> --- a/arch/x86/kvm/svm/svm.c
> +++ b/arch/x86/kvm/svm/svm.c
> @@ -4046,8 +4046,7 @@ static inline void sync_lapic_to_cr8(struct kvm_vcpu *vcpu)
> struct vcpu_svm *svm = to_svm(vcpu);
> u64 cr8;
>
> - if (nested_svm_virtualize_tpr(vcpu) ||
> - kvm_vcpu_apicv_active(vcpu))
> + if (nested_svm_virtualize_tpr(vcpu))
> return;
>
> cr8 = kvm_get_cr8(vcpu);
I agree that this is a simpler fix, so would be good to do for backport
ease.
The code in sync_lapic_to_cr8 ends up being a function call to
kvm_get_cr8() and ~6 instructions, which isn't that much. But if we can
gate sync'ing V_TPR to the backing page in sync_cr8_to_lapic() as well,
then it might be good to do so.
- Naveen
Powered by blists - more mailing lists