lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <Zymk_EaHkk7FPqru@google.com>
Date: Mon, 4 Nov 2024 20:54:20 -0800
From: Sean Christopherson <seanjc@...gle.com>
To: Paolo Bonzini <pbonzini@...hat.com>, kvm@...r.kernel.org, linux-kernel@...r.kernel.org, 
	Yong He <zhuangel570@...il.com>
Cc: Maxim Levitsky <mlevitsk@...hat.com>
Subject: Re: [PATCH] KVM: x86: Update irr_pending when setting APIC state with
 APICv disabled

+Maxim

On Fri, Nov 01, 2024, Sean Christopherson wrote:
> Explicitly set apic->irr_pending after stuffing the vIRR when userspace
> sets APIC state and APICv is disabled, otherwise KVM will skip scanning
> the vIRR in subsequent calls to apic_find_highest_irr(), and ultimately
> fail to inject the interrupt until another interrupt happens to be added
> to the vIRR.
> 
> Only the APICv-disabled case is flawed, as KVM forces apic->irr_pending to
> be true if APICv is enabled, because not all vIRR updates will be visible
> to KVM.
> 
> Note, irr_pending is intentionally not updated in kvm_apic_update_apicv(),
> because when APICv is being inhibited/disabled, KVM needs to keep the flag
> set until the next emulated EOI so that KVM will correctly handle any
> in-flight updates to the vIRR from hardware.  But when setting APIC state,
> neither the VM nor the VMM can assume specific ordering between an update
> from hardware and overwriting all state in kvm_apic_set_state(), thus KVM
> can safely clear irr_pending if the vIRR is empty.
> 
> Reported-by: Yong He <zhuangel570@...il.com>
> Closes: https://lkml.kernel.org/r/20241023124527.1092810-1-alexyonghe%40tencent.com
> Signed-off-by: Sean Christopherson <seanjc@...gle.com>
> ---
>  arch/x86/kvm/lapic.c | 9 +++++++++
>  1 file changed, 9 insertions(+)
> 
> diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
> index 65412640cfc7..deb73aea2c06 100644
> --- a/arch/x86/kvm/lapic.c
> +++ b/arch/x86/kvm/lapic.c
> @@ -3086,6 +3086,15 @@ int kvm_apic_set_state(struct kvm_vcpu *vcpu, struct kvm_lapic_state *s)
>  		kvm_x86_call(hwapic_irr_update)(vcpu,
>  						apic_find_highest_irr(apic));
>  		kvm_x86_call(hwapic_isr_update)(apic_find_highest_isr(apic));
> +	} else {
> +		/*
> +		 * Note, kvm_apic_update_apicv() is responsible for updating
> +		 * isr_count and highest_isr_cache.  irr_pending is somewhat
> +		 * special because it mustn't be cleared when APICv is disabled
> +		 * at runtime, and only state restore can cause an IRR bit to
> +		 * be set without also refreshing irr_pending.
> +		 */
> +		apic->irr_pending = apic_search_irr(apic) != -1;

I did a bit more archaeology in order to give this a Fixes tag (and a Cc: stable),
and found two interesting evolutions of this code.

The bug was introduced by commit 755c2bf87860 ("KVM: x86: lapic: don't touch
irr_pending in kvm_apic_update_apicv when inhibiting it"), which as the shortlog
suggests, deleted code that update irr_pending.

Before that commit, kvm_apic_update_apicv() did more or less what I am proposing
here, with the obvious difference that the proposed fix is specific to
kvm_lapic_reset().

        struct kvm_lapic *apic = vcpu->arch.apic;

        if (vcpu->arch.apicv_active) {
                /* irr_pending is always true when apicv is activated. */
                apic->irr_pending = true;
                apic->isr_count = 1;
        } else {
                apic->irr_pending = (apic_search_irr(apic) != -1);
                apic->isr_count = count_vectors(apic->regs + APIC_ISR);
        }

And _that_ bug (clearing irr_pending) was introduced by commit b26a695a1d78 ("kvm:
lapic: Introduce APICv update helper function").  Prior to 97a71c444a14, KVM
unconditionally set irr_pending to true in kvm_apic_set_state(), i.e. assumed
that the new virtual APIC state could have a pending IRQ (which isn't a terrible
assumption.

Furthermore, in addition to introducing this issue, commit 755c2bf87860 also
papered over the underlying bug: KVM doesn't ensure CPUs and devices see APICv
as disabled prior to searching the IRR.  Waiting until KVM emulates EOI to update
irr_pending works because KVM won't emulate EOI until after refresh_apicv_exec_ctrl(),
and because there are plenty of memory barries in between, but leaving irr_pending
set is basically hacking around bad ordering, which I _think_ can be fixed by:

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 83fe0a78146f..85d330b56c7e 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -10548,8 +10548,8 @@ void __kvm_vcpu_update_apicv(struct kvm_vcpu *vcpu)
                goto out;
 
        apic->apicv_active = activate;
-       kvm_apic_update_apicv(vcpu);
        kvm_x86_call(refresh_apicv_exec_ctrl)(vcpu);
+       kvm_apic_update_apicv(vcpu);
 
        /*
         * When APICv gets disabled, we may still have injected interrupts

So, while searching the IRR is technically sufficient to fix the bug, I'm leaning
*very* strongly torward fixing this bug by unconditionally setting irr_pending
to true in kvm_apic_update_apicv(), with a FIXME to call out what KVM should be
doing.  And then address that FIXME in a future series (I have a rather massive
pile of fixes and cleanups that are closely related, so there will be ample
opportunity).

From: Sean Christopherson <seanjc@...gle.com>
Date: Fri, 1 Nov 2024 12:35:32 -0700
Subject: [PATCH] KVM: x86: Unconditionally set irr_pending when updating APICv
 state

TODO: writeme

Fixes: 755c2bf87860 ("KVM: x86: lapic: don't touch irr_pending in kvm_apic_update_apicv when inhibiting it")
Cc: stable@...r.kernel.org
Reported-by: Yong He <zhuangel570@...il.com>
Closes: https://lkml.kernel.org/r/20241023124527.1092810-1-alexyonghe%40tencent.com
Signed-off-by: Sean Christopherson <seanjc@...gle.com>
---
 arch/x86/kvm/lapic.c | 29 ++++++++++++++++++-----------
 1 file changed, 18 insertions(+), 11 deletions(-)

diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index 2098dc689088..95c6beb8ce27 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -2629,19 +2629,26 @@ void kvm_apic_update_apicv(struct kvm_vcpu *vcpu)
 {
 	struct kvm_lapic *apic = vcpu->arch.apic;
 
-	if (apic->apicv_active) {
-		/* irr_pending is always true when apicv is activated. */
-		apic->irr_pending = true;
+	/*
+	 * When APICv is enabled, KVM must always search the IRR for a pending
+	 * IRQ, as other vCPUs and devices can set IRR bits even if the vCPU
+	 * isn't running.  If APICv is disabled, KVM _should_ search the IRR
+	 * for a pending IRQ.  But KVM currently doesn't ensure *all* hardware,
+	 * e.g. CPUs and IOMMUs, has seen the change in state, i.e. searching
+	 * the IRR at this time could race with IRQ delivery from hardware that
+	 * still sees APICv as being enabled.
+	 *
+	 * FIXME: Ensure other vCPUs and devices observe the change in APICv
+	 *        state prior to updating KVM's metadata caches, so that KVM
+	 *        can safely search the IRR and set irr_pending accordingly.
+	 */
+	apic->irr_pending = true;
+
+	if (apic->apicv_active)
 		apic->isr_count = 1;
-	} else {
-		/*
-		 * Don't clear irr_pending, searching the IRR can race with
-		 * updates from the CPU as APICv is still active from hardware's
-		 * perspective.  The flag will be cleared as appropriate when
-		 * KVM injects the interrupt.
-		 */
+	else
 		apic->isr_count = count_vectors(apic->regs + APIC_ISR);
-	}
+
 	apic->highest_isr_cache = -1;
 }
 

base-commit: 8fe4fefefa1b9ea01557d454699c20fdf709e890
-- 

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ