lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1f63036c-a72a-47bf-a75f-23ca7fd3b7cf@oracle.com>
Date: Thu, 21 Aug 2025 10:59:23 -0400
From: Alejandro Jimenez <alejandro.j.jimenez@...cle.com>
To: "Maciej S. Szmigiero" <mail@...iej.szmigiero.name>,
        Naveen N Rao <naveen@...nel.org>
Cc: Paolo Bonzini <pbonzini@...hat.com>,
        Sean Christopherson <seanjc@...gle.com>,
        Maxim Levitsky
 <mlevitsk@...hat.com>,
        Suravee Suthikulpanit
 <Suravee.Suthikulpanit@....com>,
        kvm@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH 0/2] KVM: SVM: Fix missing LAPIC TPR sync into VMCB::V_TPR
 with AVIC on



On 8/21/25 7:42 AM, Maciej S. Szmigiero wrote:
> On 21.08.2025 10:18, Naveen N Rao wrote:
>> On Tue, Aug 19, 2025 at 03:32:13PM +0200, Maciej S. Szmigiero wrote:
>>> From: "Maciej S. Szmigiero" <maciej.szmigiero@...cle.com>
>>>
>>> When AVIC is enabled the normal pre-VMRUN LAPIC TPR to VMCB::V_TPR 
>>> sync in
>>> sync_lapic_to_cr8() is inhibited so any changed TPR in the LAPIC 
>>> state would
>>> *not* get copied into the V_TPR field of VMCB.
>>>
>>> AVIC does sync between these two fields, however it does so only on
>>> explicit guest writes to one of these fields, not on a bare VMRUN.
>>>
>>> This is especially true when it is the userspace setting LAPIC state via
>>> KVM_SET_LAPIC ioctl() since userspace does not have access to the guest
>>> VMCB.
>>
>> Dumb question: why is the VMM updating TPR? Is this related to live
>> migration or such?
> 
> In this case, VMM is resetting LAPIC state on machine reset.
> 
>> I think I do see the problem described here, but when AVIC is
>> temporarily inhibited. So, trying to understand if there are other flows
>> involving the VMM where TPR could be updated outside of the guest.
>>
>>>
>>> Practice shows that it is the V_TPR that is actually used by the AVIC to
>>> decide whether to issue pending interrupts to the CPU (not TPR in 
>>> TASKPRI),
>>> so any leftover value in V_TPR will cause serious interrupt delivery 
>>> issues
>>> in the guest when AVIC is enabled.
>>>
>>> Fix this issue by explicitly copying LAPIC TPR to VMCB::V_TPR in
>>> avic_apicv_post_state_restore(), which gets called from KVM_SET_LAPIC 
>>> and
>>> similar code paths when AVIC is enabled.
>>>
>>> Add also a relevant set of tests to xapic_state_test so hopefully
>>> we'll be protected against getting such regressions in the future.
>>
>> Do the new tests reproduce this issue?
> 
> Yes, and also check quite a bit more of TPR-related functionality.
> 
>>>
>>>
>>> Yes, this breaks real guests when AVIC is enabled.
>>> Specifically, the one OS that sometimes needs different handling and its
>>> name begins with letter 'W'.
>>
>> Indeed, Linux does not use TPR AFAIK.
>>
>>

I believe it does, during the local APIC initialization. When Maciej 
determined the root cause of this issue, I was wondering why we have not 
seen it earlier in Linux. I found that Linux takes a defensive approach 
and drains all pending interrupts during lapic initialization. 
Essentially, for each CPU, Linux will:
- temporarily disable the Local APIC (via Spurious Int Vector Reg)
- set the TPR to accept all "regular" interrupts i.e. tpr=0x10
- drain all pending interrupts in ISR and/or IRR
- attempt the above draining step a max of 512 times
- then re-enable APIC and continue initialization

The relevant code is in setup_local_APIC()
https://elixir.bootlin.com/linux/v6.16/source/arch/x86/kernel/apic/apic.c#L1533-L1545

So without Maciej's proposed change, other OSs that are not as resilient 
could also be affected by this issue.

Alejandro

>> - Naveen
>>
> 
> Thanks,
> Maciej
> 
> 


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ