lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <90f4e95a-14ca-40aa-9233-389974734c3c@maciej.szmigiero.name>
Date: Sat, 23 Aug 2025 01:20:28 +0200
From: "Maciej S. Szmigiero" <mail@...iej.szmigiero.name>
To: Sean Christopherson <seanjc@...gle.com>
Cc: Paolo Bonzini <pbonzini@...hat.com>, Maxim Levitsky
 <mlevitsk@...hat.com>, Suravee Suthikulpanit
 <Suravee.Suthikulpanit@....com>,
 Alejandro Jimenez <alejandro.j.jimenez@...cle.com>, kvm@...r.kernel.org,
 linux-kernel@...r.kernel.org, Naveen N Rao <naveen@...nel.org>,
 Radim Krčmář <rkrcmar@...hat.com>
Subject: Re: [PATCH 1/2] KVM: SVM: Sync TPR from LAPIC into VMCB::V_TPR when
 setting LAPIC regs

On 21.08.2025 22:38, Sean Christopherson wrote:
> On Tue, Aug 19, 2025, Maciej S. Szmigiero wrote:
>> From: "Maciej S. Szmigiero" <maciej.szmigiero@...cle.com>
>>
>> When AVIC is enabled the normal pre-VMRUN sync in sync_lapic_to_cr8() is
>> inhibited so any changed TPR in the LAPIC state would not get copied into
>> the V_TPR field of VMCB.
>>
>> AVIC does sync between these two fields, however it does so only on
>> explicit guest writes to one of these fields, not on a bare VMRUN.
>>
>> This is especially true when it is the userspace setting LAPIC state via
>> KVM_SET_LAPIC ioctl() since userspace does not have access to the guest
>> VMCB.
>>
>> Practice shows that it is the V_TPR that is actually used by the AVIC to
>> decide whether to issue pending interrupts to the CPU (not TPR in TASKPRI),
>> so any leftover value in V_TPR will cause serious interrupt delivery issues
>> in the guest when AVIC is enabled.
>>
>> Fix this issue by explicitly copying LAPIC TPR to VMCB::V_TPR in
>> avic_apicv_post_state_restore(), which gets called from KVM_SET_LAPIC and
>> similar code paths when AVIC is enabled.
>>
>> Fixes: 3bbf3565f48c ("svm: Do not intercept CR8 when enable AVIC")
>> Signed-off-by: Maciej S. Szmigiero <maciej.szmigiero@...cle.com>
>> ---
>>   arch/x86/kvm/svm/avic.c | 23 +++++++++++++++++++++++
>>   1 file changed, 23 insertions(+)
>>
>> diff --git a/arch/x86/kvm/svm/avic.c b/arch/x86/kvm/svm/avic.c
>> index a34c5c3b164e..877bc3db2c6e 100644
>> --- a/arch/x86/kvm/svm/avic.c
>> +++ b/arch/x86/kvm/svm/avic.c
>> @@ -725,8 +725,31 @@ int avic_init_vcpu(struct vcpu_svm *svm)
>>   
>>   void avic_apicv_post_state_restore(struct kvm_vcpu *vcpu)
>>   {
>> +	struct vcpu_svm *svm = to_svm(vcpu);
>> +	u64 cr8;
>> +
>>   	avic_handle_dfr_update(vcpu);
>>   	avic_handle_ldr_update(vcpu);
>> +
>> +	/* Running nested should have inhibited AVIC. */
>> +	if (WARN_ON_ONCE(nested_svm_virtualize_tpr(vcpu)))
>> +		return;
> 
> 
>> +
>> +	/*
>> +	 * Sync TPR from LAPIC TASKPRI into V_TPR field of the VMCB.
>> +	 *
>> +	 * When AVIC is enabled the normal pre-VMRUN sync in sync_lapic_to_cr8()
>> +	 * is inhibited so any set TPR LAPIC state would not get reflected
>> +	 * in V_TPR.
> 
> Hmm, I think that code is straight up wrong.  There's no justification, just a
> claim:
> 
>    commit 3bbf3565f48ce3999b5a12cde946f81bd4475312
>    Author:     Suravee Suthikulpanit <Suravee.Suthikulpanit@....com>
>    AuthorDate: Wed May 4 14:09:51 2016 -0500
>    Commit:     Paolo Bonzini <pbonzini@...hat.com>
>    CommitDate: Wed May 18 18:04:31 2016 +0200
> 
>      svm: Do not intercept CR8 when enable AVIC
>      
>      When enable AVIC:
>          * Do not intercept CR8 since this should be handled by AVIC HW.
>          * Also, we don't need to sync cr8/V_TPR and APIC backing page.   <======
>      
>      Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@....com>
>      [Rename svm_in_nested_interrupt_shadow to svm_nested_virtualize_tpr. - Paolo]
>      Signed-off-by: Paolo Bonzini <pbonzini@...hat.com>
> 
> That claim assumes APIC[TPR] will _never_ be modified by anything other than
> hardware.  That's obviously false for state restore from userspace, and it's also
> technically false at steady state, e.g. if KVM managed to trigger emulation of a
> store to the APIC page, then KVM would bypass the automatic harware sync.
> 
> There's also the comically ancient KVM_SET_VAPIC_ADDR, which AFAICT appears to
> be largely dead code with respect to vTPR (nothing sets KVM_APIC_CHECK_VAPIC
> except for the initial ioctl), but could again set APIC[TPR] without updating
> V_TPR.
> 
> So, rather than manually do the update during state restore, my vote is to restore
> the sync logic.  And if we want to optimize that code (seems unnecessary), then
> we should hook all TPR writes.
> 
> diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> index d9931c6c4bc6..1bfebe40854f 100644
> --- a/arch/x86/kvm/svm/svm.c
> +++ b/arch/x86/kvm/svm/svm.c
> @@ -4046,8 +4046,7 @@ static inline void sync_lapic_to_cr8(struct kvm_vcpu *vcpu)
>          struct vcpu_svm *svm = to_svm(vcpu);
>          u64 cr8;
>   
> -       if (nested_svm_virtualize_tpr(vcpu) ||
> -           kvm_vcpu_apicv_active(vcpu))
> +       if (nested_svm_virtualize_tpr(vcpu))
>                  return;
>   
>          cr8 = kvm_get_cr8(vcpu);
> 
> 

So you want to just do an unconditional LAPIC -> V_TPR sync at each VMRUN
and not try to patch every code flow where these possibly could get de-synced
to do such sync only on demand, correct?

By the way, the original Suravee's submission for the aforementioned patch
did *not* inhibit that sync when AVIC is on [1].

Something similar to this sync inhibit only showed in v4 [2],
probably upon Radim's comment on v3 [3] that:
> I think we can exit early with svm_vcpu_avic_enabled().

But the initial sync inhibit condition in v4 was essentially
nested_svm_virtualize_tpr() && svm_vcpu_avic_enabled(),
which suggests there was some confusion what was exactly meant
by the reviewer comment.

The final sync inhibit condition only showed in v5 [4].
No further discussion happened on that point.

Thanks,
Maciej

[1]: https://lore.kernel.org/kvm/1455285574-27892-9-git-send-email-suravee.suthikulpanit@amd.com/
[2]: https://lore.kernel.org/kvm/1460017232-17429-11-git-send-email-Suravee.Suthikulpanit@amd.com/
[3]: https://lore.kernel.org/kvm/20160318211048.GB26119@potion.brq.redhat.com/
[4]: https://lore.kernel.org/kvm/1462388992-25242-13-git-send-email-Suravee.Suthikulpanit@amd.com/


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ