lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <zx4aiu65mmk72mo2kooj52q4k3vsp43znlrdadajivsw6ns7ou@7xtzfms3de66>
Date: Fri, 22 Aug 2025 14:34:08 +0530
From: Naveen N Rao <naveen@...nel.org>
To: Sean Christopherson <seanjc@...gle.com>
Cc: "Maciej S. Szmigiero" <mail@...iej.szmigiero.name>, 
	Paolo Bonzini <pbonzini@...hat.com>, Maxim Levitsky <mlevitsk@...hat.com>, 
	Suravee Suthikulpanit <Suravee.Suthikulpanit@....com>, Alejandro Jimenez <alejandro.j.jimenez@...cle.com>, 
	kvm@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH 1/2] KVM: SVM: Sync TPR from LAPIC into VMCB::V_TPR when
 setting LAPIC regs

On Thu, Aug 21, 2025 at 01:38:17PM -0700, Sean Christopherson wrote:
> On Tue, Aug 19, 2025, Maciej S. Szmigiero wrote:
> > From: "Maciej S. Szmigiero" <maciej.szmigiero@...cle.com>
> > 
> > When AVIC is enabled the normal pre-VMRUN sync in sync_lapic_to_cr8() is
> > inhibited so any changed TPR in the LAPIC state would not get copied into
> > the V_TPR field of VMCB.
> > 
> > AVIC does sync between these two fields, however it does so only on
> > explicit guest writes to one of these fields, not on a bare VMRUN.
> > 
> > This is especially true when it is the userspace setting LAPIC state via
> > KVM_SET_LAPIC ioctl() since userspace does not have access to the guest
> > VMCB.
> > 
> > Practice shows that it is the V_TPR that is actually used by the AVIC to
> > decide whether to issue pending interrupts to the CPU (not TPR in TASKPRI),
> > so any leftover value in V_TPR will cause serious interrupt delivery issues
> > in the guest when AVIC is enabled.
> > 
> > Fix this issue by explicitly copying LAPIC TPR to VMCB::V_TPR in
> > avic_apicv_post_state_restore(), which gets called from KVM_SET_LAPIC and
> > similar code paths when AVIC is enabled.
> > 
> > Fixes: 3bbf3565f48c ("svm: Do not intercept CR8 when enable AVIC")
> > Signed-off-by: Maciej S. Szmigiero <maciej.szmigiero@...cle.com>
> > ---
> >  arch/x86/kvm/svm/avic.c | 23 +++++++++++++++++++++++
> >  1 file changed, 23 insertions(+)
> > 
> > diff --git a/arch/x86/kvm/svm/avic.c b/arch/x86/kvm/svm/avic.c
> > index a34c5c3b164e..877bc3db2c6e 100644
> > --- a/arch/x86/kvm/svm/avic.c
> > +++ b/arch/x86/kvm/svm/avic.c
> > @@ -725,8 +725,31 @@ int avic_init_vcpu(struct vcpu_svm *svm)
> >  
> >  void avic_apicv_post_state_restore(struct kvm_vcpu *vcpu)
> >  {
> > +	struct vcpu_svm *svm = to_svm(vcpu);
> > +	u64 cr8;
> > +
> >  	avic_handle_dfr_update(vcpu);
> >  	avic_handle_ldr_update(vcpu);
> > +
> > +	/* Running nested should have inhibited AVIC. */
> > +	if (WARN_ON_ONCE(nested_svm_virtualize_tpr(vcpu)))
> > +		return;
> 
> 
> > +
> > +	/*
> > +	 * Sync TPR from LAPIC TASKPRI into V_TPR field of the VMCB.
> > +	 *
> > +	 * When AVIC is enabled the normal pre-VMRUN sync in sync_lapic_to_cr8()
> > +	 * is inhibited so any set TPR LAPIC state would not get reflected
> > +	 * in V_TPR.
> 
> Hmm, I think that code is straight up wrong.  There's no justification, just a
> claim:
> 
>   commit 3bbf3565f48ce3999b5a12cde946f81bd4475312
>   Author:     Suravee Suthikulpanit <Suravee.Suthikulpanit@....com>
>   AuthorDate: Wed May 4 14:09:51 2016 -0500
>   Commit:     Paolo Bonzini <pbonzini@...hat.com>
>   CommitDate: Wed May 18 18:04:31 2016 +0200
> 
>     svm: Do not intercept CR8 when enable AVIC
>     
>     When enable AVIC:
>         * Do not intercept CR8 since this should be handled by AVIC HW.
>         * Also, we don't need to sync cr8/V_TPR and APIC backing page.   <======
>     
>     Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@....com>
>     [Rename svm_in_nested_interrupt_shadow to svm_nested_virtualize_tpr. - Paolo]
>     Signed-off-by: Paolo Bonzini <pbonzini@...hat.com>
> 
> That claim assumes APIC[TPR] will _never_ be modified by anything other than
> hardware. 

It also isn't clear to me why only sync_lapic_to_cr8() was gated when 
AVIC was enabled, while sync_cr8_to_lapic() continues to copy V_TRP to 
the backing page. If AVIC is enabled, then the AVIC hardware updates 
both the backing page and V_TPR on a guest write to TPR.

> That's obviously false for state restore from userspace, and it's also
> technically false at steady state, e.g. if KVM managed to trigger emulation of a
> store to the APIC page, then KVM would bypass the automatic harware sync.

Do you mean emulation due to AVIC being inhibited? I initially thought 
this could be a problem, but in this scenario, AVIC would be disabled on 
the next VMRUN, so we will end up sync'ing TPR from the lapic to V_TPR.

> 
> There's also the comically ancient KVM_SET_VAPIC_ADDR, which AFAICT appears to
> be largely dead code with respect to vTPR (nothing sets KVM_APIC_CHECK_VAPIC
> except for the initial ioctl), but could again set APIC[TPR] without updating
> V_TPR.
>
> So, rather than manually do the update during state restore, my vote 
> is to restore the sync logic.  And if we want to optimize that code 
> (seems unnecessary), then we should hook all TPR writes.

I guess you mean apic_set_tpr()? We will need to hook into that in 
addition to updating avic_apicv_post_state_restore() since KVM_SET_LAPIC 
just does a memcpy of the register state.

> 
> diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> index d9931c6c4bc6..1bfebe40854f 100644
> --- a/arch/x86/kvm/svm/svm.c
> +++ b/arch/x86/kvm/svm/svm.c
> @@ -4046,8 +4046,7 @@ static inline void sync_lapic_to_cr8(struct kvm_vcpu *vcpu)
>         struct vcpu_svm *svm = to_svm(vcpu);
>         u64 cr8;
>  
> -       if (nested_svm_virtualize_tpr(vcpu) ||
> -           kvm_vcpu_apicv_active(vcpu))
> +       if (nested_svm_virtualize_tpr(vcpu))
>                 return;
>  
>         cr8 = kvm_get_cr8(vcpu);

I agree that this is a simpler fix, so would be good to do for backport 
ease.

The code in sync_lapic_to_cr8 ends up being a function call to 
kvm_get_cr8() and ~6 instructions, which isn't that much. But if we can 
gate sync'ing V_TPR to the backing page in sync_cr8_to_lapic() as well, 
then it might be good to do so.


- Naveen


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ