[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1f63036c-a72a-47bf-a75f-23ca7fd3b7cf@oracle.com>
Date: Thu, 21 Aug 2025 10:59:23 -0400
From: Alejandro Jimenez <alejandro.j.jimenez@...cle.com>
To: "Maciej S. Szmigiero" <mail@...iej.szmigiero.name>,
Naveen N Rao <naveen@...nel.org>
Cc: Paolo Bonzini <pbonzini@...hat.com>,
Sean Christopherson <seanjc@...gle.com>,
Maxim Levitsky
<mlevitsk@...hat.com>,
Suravee Suthikulpanit
<Suravee.Suthikulpanit@....com>,
kvm@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH 0/2] KVM: SVM: Fix missing LAPIC TPR sync into VMCB::V_TPR
with AVIC on
On 8/21/25 7:42 AM, Maciej S. Szmigiero wrote:
> On 21.08.2025 10:18, Naveen N Rao wrote:
>> On Tue, Aug 19, 2025 at 03:32:13PM +0200, Maciej S. Szmigiero wrote:
>>> From: "Maciej S. Szmigiero" <maciej.szmigiero@...cle.com>
>>>
>>> When AVIC is enabled the normal pre-VMRUN LAPIC TPR to VMCB::V_TPR
>>> sync in
>>> sync_lapic_to_cr8() is inhibited so any changed TPR in the LAPIC
>>> state would
>>> *not* get copied into the V_TPR field of VMCB.
>>>
>>> AVIC does sync between these two fields, however it does so only on
>>> explicit guest writes to one of these fields, not on a bare VMRUN.
>>>
>>> This is especially true when it is the userspace setting LAPIC state via
>>> KVM_SET_LAPIC ioctl() since userspace does not have access to the guest
>>> VMCB.
>>
>> Dumb question: why is the VMM updating TPR? Is this related to live
>> migration or such?
>
> In this case, VMM is resetting LAPIC state on machine reset.
>
>> I think I do see the problem described here, but when AVIC is
>> temporarily inhibited. So, trying to understand if there are other flows
>> involving the VMM where TPR could be updated outside of the guest.
>>
>>>
>>> Practice shows that it is the V_TPR that is actually used by the AVIC to
>>> decide whether to issue pending interrupts to the CPU (not TPR in
>>> TASKPRI),
>>> so any leftover value in V_TPR will cause serious interrupt delivery
>>> issues
>>> in the guest when AVIC is enabled.
>>>
>>> Fix this issue by explicitly copying LAPIC TPR to VMCB::V_TPR in
>>> avic_apicv_post_state_restore(), which gets called from KVM_SET_LAPIC
>>> and
>>> similar code paths when AVIC is enabled.
>>>
>>> Add also a relevant set of tests to xapic_state_test so hopefully
>>> we'll be protected against getting such regressions in the future.
>>
>> Do the new tests reproduce this issue?
>
> Yes, and also check quite a bit more of TPR-related functionality.
>
>>>
>>>
>>> Yes, this breaks real guests when AVIC is enabled.
>>> Specifically, the one OS that sometimes needs different handling and its
>>> name begins with letter 'W'.
>>
>> Indeed, Linux does not use TPR AFAIK.
>>
>>
I believe it does, during the local APIC initialization. When Maciej
determined the root cause of this issue, I was wondering why we have not
seen it earlier in Linux. I found that Linux takes a defensive approach
and drains all pending interrupts during lapic initialization.
Essentially, for each CPU, Linux will:
- temporarily disable the Local APIC (via Spurious Int Vector Reg)
- set the TPR to accept all "regular" interrupts i.e. tpr=0x10
- drain all pending interrupts in ISR and/or IRR
- attempt the above draining step a max of 512 times
- then re-enable APIC and continue initialization
The relevant code is in setup_local_APIC()
https://elixir.bootlin.com/linux/v6.16/source/arch/x86/kernel/apic/apic.c#L1533-L1545
So without Maciej's proposed change, other OSs that are not as resilient
could also be affected by this issue.
Alejandro
>> - Naveen
>>
>
> Thanks,
> Maciej
>
>
Powered by blists - more mailing lists