lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250923050317.205482-8-Neeraj.Upadhyay@amd.com>
Date: Tue, 23 Sep 2025 10:33:07 +0530
From: Neeraj Upadhyay <Neeraj.Upadhyay@....com>
To: <kvm@...r.kernel.org>, <seanjc@...gle.com>, <pbonzini@...hat.com>
CC: <linux-kernel@...r.kernel.org>, <Thomas.Lendacky@....com>,
	<nikunj@....com>, <Santosh.Shukla@....com>, <Vasant.Hegde@....com>,
	<Suravee.Suthikulpanit@....com>, <bp@...en8.de>, <David.Kaplan@....com>,
	<huibo.wang@....com>, <naveen.rao@....com>, <tiala@...rosoft.com>
Subject: [RFC PATCH v2 07/17] KVM: SVM: Add IPI Delivery Support for Secure AVIC

Secure AVIC hardware only accelerates Self-IPI, i.e. on WRMSR to
APIC_SELF_IPI and APIC_ICR (with destination shorthand equal to "self")
registers, hardware takes care of updating the APIC_IRR in the guest-owned
APIC backing page of the vCPU. For other IPI types (cross-vCPU, broadcast
IPIs), software needs to take care of updating the APIC_IRR state in the
target vCPUs' APIC backing page and to ensure that the target vCPU notices
the new pending interrupt.

To ensure that the remote vCPU notices the new pending interrupt, the guest
sends a APIC_ICR MSR-write GHCB protocol event to the hypervisor.

Handle the APIC_ICR write MSR exits for Secure AVIC guests by either
sending an AVIC doorbell (if the target vCPU is running) or by waking up
the non-running target vCPU thread.

To ensure that the target vCPU observes the new IPI request, introduce a
new per-vcpu flag, sev_savic_has_pending_ipi. This flag acts as a reliable
"sticky bit" that signals a pending IPI, ensuring the event is not lost
even if the primary wakeup mechanism is missed. Update
sev_savic_has_pending_interrupt() to return true if
sev_savic_has_pending_ipi is set. This ensures that when a vCPU is about
to block (in kvm_vcpu_block()), it correctly recognizes that it has work
to do and will not go to sleep.

Clear the sev_savic_has_pending_ipi flag in pre_sev_run() just before the
next VM-entry. This resets the one-shot signal, as the pending interrupt
is now about to be processed by the hardware upon VMRUN.

During APIC_ICR write GHCB request handling, unconditionally set
sev_savic_has_pending_ipi for the target vCPU irrespective of whether the
target vCPU is in guest mode or not. If the target vCPU does not take any
other VMEXIT before taking next hlt exit, the vCPU blocking fails as
sev_savic_has_pending_ipi remains set. The sev_savic_has_pending_ipi is
cleared before next VMRUN and on subsequent hlt exit the vCPU thread
would block.

Following are the race conditions which can occur between target vCPU
doing hlt and the source vCPU's IPI request handling.

a. VMEXIT before HLT when RFLAGS.IF = 0 or Interrupt shadow is active.

   #Source-vCPU                          #Target-VCPU

   1. sev_savic_has_pending_ipi = true
   2. smp_mb();
                                         3. Disable interrupts
   4. Target vCPU is in guest mode
   5. Raise AVIC doorbell to target
      vCPU's physical APIC_ID
                                         6. VMEXIT
                                         7. sev_savic_has_pending_ipi =
                                            false
                                         8. VMRUN
                                         9. HLT
                                        10. VMEXIT
                                        11. kvm_arch_vcpu_runnable()
                                            returns false
                                        12. vCPU thread blocks

   In this scenario IDLE HLT intercept ensures that the target vCPU does
   not take hlt intercept as V_INTR is set (AVIC doorbell by source vCPU
   triggers evaluation of Secure AVIC backing page of the target vCPU
   and sets V_INTR).

b. Target vCPU takes HLT VMEXIT but hasn't cleared IN_GUEST_MODE at the
   time when doorbell write is issued by source CPU.

   #Source-vCPU                          #Target-VCPU

   1. sev_savic_has_pending_ipi = true
   2. smp_mb();
   3. Target vCPU is in guest mode
                                         4. HLT
                                         5. VMEXIT
   6. Raise AVIC doorbell to the target
      physical CPU.
                                         7. vcpu->mode =
                                              OUTSIDE_GUEST_MODE
                                         8. kvm_cpu_has_interrupt()
                                             protected_..._interrupt()
                                              smp_mb()
                                              sev_savic_has_pending_ipi is
                                              true

   In this case, the smp_mb() barriers at 2, 8 guarantee that the target
   vCPU's thread observes sev_savic_has_pending_ipi is set and returns to
   the guest mode without blocking.

c. For other cases, where the source vCPU thread observes the target vCPU
   to be outside of the guest mode, memory barriers in rcuwait_wake_up()
   (source vCPU thread) and set_current_state() (target vCPU thread)
   provides the required ordering and ensures that read of
   sev_savic_has_pending_ipi in kvm_vcpu_check_block() observes the write
   by the source vCPU.

   #Source-vCPU                          #Target-VCPU

   rcuwait_wake_up()
     smp_mb()
     task = rcu_dereference(w->task);
     if (task)
       wake_up_process()
                                        prepare_to_rcuwait()
                                          w->task = current
                                        set_current_state(
                                            TASK_INTERRUPTIBLE)
                                          smp_mb()
                                        kvm_vcpu_check_block()
                                          kvm_cpu_has_interrupt()
                                            <Read sev_savic_has_..._ipi>

Co-developed-by: Kishon Vijay Abraham I <kvijayab@....com>
Signed-off-by: Kishon Vijay Abraham I <kvijayab@....com>
Signed-off-by: Neeraj Upadhyay <Neeraj.Upadhyay@....com>
---
 arch/x86/kvm/svm/sev.c | 218 ++++++++++++++++++++++++++++++++++++++++-
 arch/x86/kvm/svm/svm.h |   2 +
 2 files changed, 219 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 78cefc14a2ee..a64fcc7637c7 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -3511,6 +3511,89 @@ int pre_sev_run(struct vcpu_svm *svm, int cpu)
 	if (!cpumask_test_cpu(cpu, to_kvm_sev_info(kvm)->have_run_cpus))
 		cpumask_set_cpu(cpu, to_kvm_sev_info(kvm)->have_run_cpus);
 
+	/*
+	 * It should be safe to clear sev_savic_has_pending_ipi here.
+	 *
+	 * Following are the scenarios possible:
+	 *
+	 * Scenario 1: sev_savic_has_pending_ipi is set before hlt exit of the
+	 * target vCPU.
+	 *
+	 * Source vCPU                     Target vCPU
+	 *
+	 * 1. Set APIC_IRR of target
+	 *    vCPU.
+	 *
+	 * 2. VMGEXIT
+	 *
+	 * 3. Set ...has_pending_ipi
+	 *
+	 * savic_handle_icr_write()
+	 *   ..._has_pending_ipi = true
+	 *
+	 * 4. avic_ring_doorbell()
+	 *                            - VS -
+	 *
+	 *				   4. VMEXIT
+	 *
+	 *                                 5. ..._has_pending_ipi = false
+	 *
+	 *                                 6. VM entry
+	 *
+	 *                                 7. hlt exit
+	 *
+	 * In this case, any VM exit taken by target vCPU before hlt exit
+	 * clears sev_savic_has_pending_ipi. On hlt exit, idle halt intercept
+	 * would find the V_INTR set and skip hlt exit.
+	 *
+	 * Scenario 2: sev_savic_has_pending_ipi is set when target vCPU
+	 * has taken hlt exit.
+	 *
+	 * Source vCPU                     Target vCPU
+	 *
+	 *                                 1. hlt exit
+	 *
+	 * 2. Set ...has_pending_ipi
+	 *                                 3. kvm_vcpu_has_events() returns true
+	 *                                    and VM is reentered.
+	 *
+	 *                                    vcpu_block()
+	 *                                      kvm_arch_vcpu_runnable()
+	 *                                        kvm_vcpu_has_events()
+	 *                                          <return true as ..._has_pending_ipi
+	 *                                           is set>
+	 *
+	 *                                 4. On VM entry, APIC_IRR state is re-evaluated
+	 *                                    and V_INTR is set and interrupt is delivered
+	 *                                    to vCPU.
+	 *
+	 *
+	 * Scenario 3: sev_savic_has_pending_ipi is set while halt exit is happening:
+	 *
+	 *
+	 * Source vCPU                        Target vCPU
+	 *
+	 *                                  1. hlt
+	 *                                       Hardware check V_INTR to determine
+	 *                                       if hlt exit need to be taken. No other
+	 *                                       exit such as intr exit can be taken
+	 *                                       while this sequence is being executed.
+	 *
+	 * 2. Set APIC_IRR of target vCPU.
+	 *
+	 * 3. Set ...has_pending_ipi
+	 *                                  4. hlt exit taken.
+	 *
+	 *                                  5. ...has_pending_ipi being set is observed
+	 *                                     by target vCPU and the vCPU is resumed.
+	 *
+	 * In this scenario, hardware ensures that target vCPU does not take any exit
+	 * between checking V_INTR state and halt exit. So, sev_savic_has_pending_ipi
+	 * remains set when vCPU takes hlt exit.
+	 */
+	if (READ_ONCE(svm->sev_savic_has_pending_ipi))
+		WRITE_ONCE(svm->sev_savic_has_pending_ipi, false);
+
 	/* Assign the asid allocated with this SEV guest */
 	svm->asid = asid;
 
@@ -4281,6 +4364,129 @@ static int sev_handle_vmgexit_msr_protocol(struct vcpu_svm *svm)
 	return 0;
 }
 
+static void savic_handle_icr_write(struct kvm_vcpu *kvm_vcpu, u64 icr)
+{
+	struct kvm *kvm = kvm_vcpu->kvm;
+	struct kvm_vcpu *vcpu;
+	u32 icr_low, icr_high;
+	bool in_guest_mode;
+	unsigned long i;
+
+	icr_low = lower_32_bits(icr);
+	icr_high = upper_32_bits(icr);
+
+	/*
+	 * TODO: Instead of scanning all the vCPUS, get fastpath working which should
+	 * look similar to avic_kick_target_vcpus_fast().
+	 */
+	kvm_for_each_vcpu(i, vcpu, kvm) {
+		if (!kvm_apic_match_dest(vcpu, kvm_vcpu->arch.apic, icr_low & APIC_SHORT_MASK,
+					 icr_high, icr_low & APIC_DEST_MASK))
+			continue;
+
+		/*
+		 * Setting sev_savic_has_pending_ipi could result in a spurious
+		 * return from hlt (as kvm_cpu_has_interrupt() would return true)
+		 * if destination CPU is in guest mode and the guest takes a hlt
+		 * exit after handling the IPI. sev_savic_has_pending_ipi gets cleared
+		 * on VM entry, so there can be at most one spurious return per IPI.
+		 * For vcpu->mode == IN_GUEST_MODE, sev_savic_has_pending_ipi need
+		 * to be set to handle the case where the destination vCPU has taken
+		 * hlt exit and the source CPU has not observed (target)vcpu->mode !=
+		 * IN_GUEST_MODE.
+		 */
+		WRITE_ONCE(to_svm(vcpu)->sev_savic_has_pending_ipi, true);
+		/* Order sev_savic_has_pending_ipi write and vcpu->mode read. */
+		smp_mb();
+		/* Pairs with smp_store_release in vcpu_enter_guest. */
+		in_guest_mode = (smp_load_acquire(&vcpu->mode) == IN_GUEST_MODE);
+		if (in_guest_mode) {
+			/*
+			 * Signal the doorbell to tell hardware to inject the IRQ.
+			 *
+			 * If the vCPU exits the guest before the doorbell chimes,
+			 * below memory ordering guarantees that the destination vCPU
+			 * observes sev_savic_has_pending_ipi == true before
+			 * blocking.
+			 *
+			 *   Src-CPU                       Dest-CPU
+			 *
+			 *  savic_handle_icr_write()
+			 *    sev_savic_has_pending_ipi = true
+			 *    smp_mb()
+			 *    smp_load_acquire(&vcpu->mode)
+			 *
+			 *                    - VS -
+			 *                              vcpu->mode = OUTSIDE_GUEST_MODE
+			 *                              __kvm_emulate_halt()
+			 *                                kvm_cpu_has_interrupt()
+			 *                                  smp_mb()
+			 *                                  if (sev_savic_has_pending_ipi)
+			 *                                      return true;
+			 *
+			 *   [S1]
+			 *     sev_savic_has_pending_ipi = true
+			 *
+			 *     SMP_MB
+			 *
+			 *   [L1]
+			 *     vcpu->mode
+			 *                                  [S2]
+			 *                                  vcpu->mode = OUTSIDE_GUEST_MODE
+			 *
+			 *
+			 *                                  SMP_MB
+			 *
+			 *                                  [L2] sev_savic_has_pending_ipi == true
+			 *
+			 *   exists (L1=IN_GUEST_MODE /\ L2=false)
+			 *
+			 *   Above condition does not exit. So, if the source CPU observes
+			 *   vcpu->mode = IN_GUEST_MODE (L1), sev_savic_has_pending_ipi load by
+			 *   the destination CPU (L2) should observe the store (S1) from the
+			 *   source CPU.
+			 */
+			avic_ring_doorbell(vcpu);
+		} else {
+			/*
+			 * Wakeup the vCPU if it was blocking.
+			 *
+			 * Memory ordering is provided by smp_mb() in rcuwait_wake_up() on the
+			 * source CPU and smp_mb() in set_current_state() inside kvm_vcpu_block()
+			 * on the destination CPU.
+			 */
+			kvm_vcpu_kick(vcpu);
+		}
+	}
+}
+
+static bool savic_handle_msr_exit(struct kvm_vcpu *vcpu)
+{
+	u32 msr, reg;
+
+	msr = kvm_rcx_read(vcpu);
+	reg = (msr - APIC_BASE_MSR) << 4;
+
+	switch (reg) {
+	case APIC_ICR:
+		/*
+		 * Only APIC_ICR WRMSR requires special handling for Secure AVIC
+		 * guests to wake up destination vCPUs.
+		 */
+		if (to_svm(vcpu)->vmcb->control.exit_info_1) {
+			u64 data = kvm_read_edx_eax(vcpu);
+
+			savic_handle_icr_write(vcpu, data);
+			return true;
+		}
+		break;
+	default:
+		break;
+	}
+
+	return false;
+}
+
 int sev_handle_vmgexit(struct kvm_vcpu *vcpu)
 {
 	struct vcpu_svm *svm = to_svm(vcpu);
@@ -4419,6 +4625,11 @@ int sev_handle_vmgexit(struct kvm_vcpu *vcpu)
 			    control->exit_info_1, control->exit_info_2);
 		ret = -EINVAL;
 		break;
+	case SVM_EXIT_MSR:
+		if (sev_savic_active(vcpu->kvm) && savic_handle_msr_exit(vcpu))
+			return 1;
+
+		fallthrough;
 	default:
 		ret = svm_invoke_exit_handler(vcpu, exit_code);
 	}
@@ -5106,5 +5317,10 @@ void sev_savic_set_requested_irr(struct vcpu_svm *svm, bool reinjected)
 
 bool sev_savic_has_pending_interrupt(struct kvm_vcpu *vcpu)
 {
-	return kvm_apic_has_interrupt(vcpu) != -1;
+	/*
+	 * See memory ordering description in savic_handle_icr_write().
+	 */
+	smp_mb();
+	return READ_ONCE(to_svm(vcpu)->sev_savic_has_pending_ipi) ||
+		kvm_apic_has_interrupt(vcpu) != -1;
 }
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index 60dc424d62c4..a3edb6e720cd 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -335,6 +335,8 @@ struct vcpu_svm {
 
 	/* Guest GIF value, used when vGIF is not enabled */
 	bool guest_gif;
+
+	bool sev_savic_has_pending_ipi;
 };
 
 struct svm_cpu_data {
-- 
2.34.1


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ