[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <90f4e95a-14ca-40aa-9233-389974734c3c@maciej.szmigiero.name>
Date: Sat, 23 Aug 2025 01:20:28 +0200
From: "Maciej S. Szmigiero" <mail@...iej.szmigiero.name>
To: Sean Christopherson <seanjc@...gle.com>
Cc: Paolo Bonzini <pbonzini@...hat.com>, Maxim Levitsky
<mlevitsk@...hat.com>, Suravee Suthikulpanit
<Suravee.Suthikulpanit@....com>,
Alejandro Jimenez <alejandro.j.jimenez@...cle.com>, kvm@...r.kernel.org,
linux-kernel@...r.kernel.org, Naveen N Rao <naveen@...nel.org>,
Radim Krčmář <rkrcmar@...hat.com>
Subject: Re: [PATCH 1/2] KVM: SVM: Sync TPR from LAPIC into VMCB::V_TPR when
setting LAPIC regs
On 21.08.2025 22:38, Sean Christopherson wrote:
> On Tue, Aug 19, 2025, Maciej S. Szmigiero wrote:
>> From: "Maciej S. Szmigiero" <maciej.szmigiero@...cle.com>
>>
>> When AVIC is enabled the normal pre-VMRUN sync in sync_lapic_to_cr8() is
>> inhibited so any changed TPR in the LAPIC state would not get copied into
>> the V_TPR field of VMCB.
>>
>> AVIC does sync between these two fields, however it does so only on
>> explicit guest writes to one of these fields, not on a bare VMRUN.
>>
>> This is especially true when it is the userspace setting LAPIC state via
>> KVM_SET_LAPIC ioctl() since userspace does not have access to the guest
>> VMCB.
>>
>> Practice shows that it is the V_TPR that is actually used by the AVIC to
>> decide whether to issue pending interrupts to the CPU (not TPR in TASKPRI),
>> so any leftover value in V_TPR will cause serious interrupt delivery issues
>> in the guest when AVIC is enabled.
>>
>> Fix this issue by explicitly copying LAPIC TPR to VMCB::V_TPR in
>> avic_apicv_post_state_restore(), which gets called from KVM_SET_LAPIC and
>> similar code paths when AVIC is enabled.
>>
>> Fixes: 3bbf3565f48c ("svm: Do not intercept CR8 when enable AVIC")
>> Signed-off-by: Maciej S. Szmigiero <maciej.szmigiero@...cle.com>
>> ---
>> arch/x86/kvm/svm/avic.c | 23 +++++++++++++++++++++++
>> 1 file changed, 23 insertions(+)
>>
>> diff --git a/arch/x86/kvm/svm/avic.c b/arch/x86/kvm/svm/avic.c
>> index a34c5c3b164e..877bc3db2c6e 100644
>> --- a/arch/x86/kvm/svm/avic.c
>> +++ b/arch/x86/kvm/svm/avic.c
>> @@ -725,8 +725,31 @@ int avic_init_vcpu(struct vcpu_svm *svm)
>>
>> void avic_apicv_post_state_restore(struct kvm_vcpu *vcpu)
>> {
>> + struct vcpu_svm *svm = to_svm(vcpu);
>> + u64 cr8;
>> +
>> avic_handle_dfr_update(vcpu);
>> avic_handle_ldr_update(vcpu);
>> +
>> + /* Running nested should have inhibited AVIC. */
>> + if (WARN_ON_ONCE(nested_svm_virtualize_tpr(vcpu)))
>> + return;
>
>
>> +
>> + /*
>> + * Sync TPR from LAPIC TASKPRI into V_TPR field of the VMCB.
>> + *
>> + * When AVIC is enabled the normal pre-VMRUN sync in sync_lapic_to_cr8()
>> + * is inhibited so any set TPR LAPIC state would not get reflected
>> + * in V_TPR.
>
> Hmm, I think that code is straight up wrong. There's no justification, just a
> claim:
>
> commit 3bbf3565f48ce3999b5a12cde946f81bd4475312
> Author: Suravee Suthikulpanit <Suravee.Suthikulpanit@....com>
> AuthorDate: Wed May 4 14:09:51 2016 -0500
> Commit: Paolo Bonzini <pbonzini@...hat.com>
> CommitDate: Wed May 18 18:04:31 2016 +0200
>
> svm: Do not intercept CR8 when enable AVIC
>
> When enable AVIC:
> * Do not intercept CR8 since this should be handled by AVIC HW.
> * Also, we don't need to sync cr8/V_TPR and APIC backing page. <======
>
> Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@....com>
> [Rename svm_in_nested_interrupt_shadow to svm_nested_virtualize_tpr. - Paolo]
> Signed-off-by: Paolo Bonzini <pbonzini@...hat.com>
>
> That claim assumes APIC[TPR] will _never_ be modified by anything other than
> hardware. That's obviously false for state restore from userspace, and it's also
> technically false at steady state, e.g. if KVM managed to trigger emulation of a
> store to the APIC page, then KVM would bypass the automatic harware sync.
>
> There's also the comically ancient KVM_SET_VAPIC_ADDR, which AFAICT appears to
> be largely dead code with respect to vTPR (nothing sets KVM_APIC_CHECK_VAPIC
> except for the initial ioctl), but could again set APIC[TPR] without updating
> V_TPR.
>
> So, rather than manually do the update during state restore, my vote is to restore
> the sync logic. And if we want to optimize that code (seems unnecessary), then
> we should hook all TPR writes.
>
> diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> index d9931c6c4bc6..1bfebe40854f 100644
> --- a/arch/x86/kvm/svm/svm.c
> +++ b/arch/x86/kvm/svm/svm.c
> @@ -4046,8 +4046,7 @@ static inline void sync_lapic_to_cr8(struct kvm_vcpu *vcpu)
> struct vcpu_svm *svm = to_svm(vcpu);
> u64 cr8;
>
> - if (nested_svm_virtualize_tpr(vcpu) ||
> - kvm_vcpu_apicv_active(vcpu))
> + if (nested_svm_virtualize_tpr(vcpu))
> return;
>
> cr8 = kvm_get_cr8(vcpu);
>
>
So you want to just do an unconditional LAPIC -> V_TPR sync at each VMRUN
and not try to patch every code flow where these possibly could get de-synced
to do such sync only on demand, correct?
By the way, the original Suravee's submission for the aforementioned patch
did *not* inhibit that sync when AVIC is on [1].
Something similar to this sync inhibit only showed in v4 [2],
probably upon Radim's comment on v3 [3] that:
> I think we can exit early with svm_vcpu_avic_enabled().
But the initial sync inhibit condition in v4 was essentially
nested_svm_virtualize_tpr() && svm_vcpu_avic_enabled(),
which suggests there was some confusion what was exactly meant
by the reviewer comment.
The final sync inhibit condition only showed in v5 [4].
No further discussion happened on that point.
Thanks,
Maciej
[1]: https://lore.kernel.org/kvm/1455285574-27892-9-git-send-email-suravee.suthikulpanit@amd.com/
[2]: https://lore.kernel.org/kvm/1460017232-17429-11-git-send-email-Suravee.Suthikulpanit@amd.com/
[3]: https://lore.kernel.org/kvm/20160318211048.GB26119@potion.brq.redhat.com/
[4]: https://lore.kernel.org/kvm/1462388992-25242-13-git-send-email-Suravee.Suthikulpanit@amd.com/
Powered by blists - more mailing lists