[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250923050317.205482-1-Neeraj.Upadhyay@amd.com>
Date: Tue, 23 Sep 2025 10:33:00 +0530
From: Neeraj Upadhyay <Neeraj.Upadhyay@....com>
To: <kvm@...r.kernel.org>, <seanjc@...gle.com>, <pbonzini@...hat.com>
CC: <linux-kernel@...r.kernel.org>, <Thomas.Lendacky@....com>,
<nikunj@....com>, <Santosh.Shukla@....com>, <Vasant.Hegde@....com>,
<Suravee.Suthikulpanit@....com>, <bp@...en8.de>, <David.Kaplan@....com>,
<huibo.wang@....com>, <naveen.rao@....com>, <tiala@...rosoft.com>
Subject: [RFC PATCH v2 00/17] AMD: Add Secure AVIC KVM Support
Introduction
------------
Secure AVIC is a new hardware feature in the AMD64 architecture to
allow SEV-SNP guests to prevent hypervisor from generating unexpected
interrupts to a vCPU or otherwise violate architectural assumptions
around APIC behavior.
One of the significant differences from AVIC or emulated x2APIC is that
Secure AVIC uses a guest-owned and managed APIC backing page. It also
introduces additional fields in both the VMCB and the Secure AVIC backing
page to aid the guest in limiting which interrupt vectors can be injected
into the guest.
Guest APIC Backing Page
-----------------------
Each vCPU has a guest-allocated APIC backing page, which maintains APIC
state for that vCPU. The x2APIC MSRs are mapped at their corresposing
x2APIC MMIO offset within the guest APIC backing page. All x2APIC accesses
by guest or Secure AVIC hardware operate on this backing page. The
backing page should be pinned and NPT entry for it should be always
mapped while the corresponding vCPU is running.
MSR Accesses
------------
Secure AVIC only supports x2APIC MSR accesses. xAPIC MMIO offset based
accesses are not supported.
Some of the MSR writes such as ICR writes (with shorthand equal to
self), SELF_IPI, EOI, TPR writes are accelerated by Secure AVIC
hardware. Other MSR writes generate a #VC exception (
VMEXIT_AVIC_NOACCEL or VMEXIT_AVIC_INCOMPLETE_IPI). The #VC
exception handler reads/writes to the guest APIC backing page.
As guest APIC backing page is accessible to the guest, guest can
optimize APIC register access by directly reading/writing to the
guest APIC backing page (instead of taking the #VC exception route).
APIC MSR reads are accelerated similar to AVIC, as described in
table "15-22. Guest vAPIC Register Access Behavior" of APM.
In addition to the architected MSRs, following new fields are added to
the guest APIC backing page which can be modified directly by the
guest:
a. ALLOWED_IRR
ALLOWED_IRR vector indicates the interrupt vectors which the guest
allows the hypervisor to send. The combination of host-controlled
REQUESTED_IRR vectors (part of VMCB) and guest-controlled ALLOWED_IRR
is used by hardware to update the IRR vectors of the Guest APIC
backing page.
#Offset #bits Description
204h 31:0 Guest allowed vectors 0-31
214h 31:0 Guest allowed vectors 32-63
...
274h 31:0 Guest allowed vectors 224-255
ALLOWED_IRR is meant to be used specifically for vectors that the
hypervisor emulates and is allowed to inject, such as IOAPIC/MSI
device interrupts. Interrupt vectors used exclusively by the guest
itself (like IPI vectors) should not be allowed to be injected into
the guest for security reasons.
b. NMI Request
#Offset #bits Description
278h 0 Set by Guest to request Virtual NMI
Guest can set NMI_REQUEST to trigger APIC_ICR based NMIs.
APIC Registers
--------------
1. APIC ID
APIC_ID values is set by KVM and similar to x2apic, it is equal to
vcpu_id for a vCPU.
2. APIC LVR
APIC Version register is expected to be read from KVM's APIC state using
MSR_PROT RDMSR VMGEXIT and updated in the guest APIC backing page.
3. APIC TPR
TPR writes are accelerated and not communicated to KVM. So, the
hypervisor does not have information about TPR value for a vCPU.
4. APIC PPR
Current state of PPR is not visible to KVM.
5. APIC SPIV
Spurious Interrupt Vector register value is communicated by the guest to
the KVM.
6. APIC IRR and APIC ISR
IRR and ISR states are visible only to the guest. So, KVM cannot use these
registers to determine guest interrupts which are pending completion.
7. APIC TMR
Trigger Mode Register state is owned by the guest and not visible to KVM.
However, for IOAPIC external interrupts, KVM's software vAPIC trigger
mode is set from the guest-controlled redirection table. So, the APIC_TMR
values in the software vAPIC state can be used to identify between edge
and level triggered IOAPIC interrupts.
8. Timer registers - TMICT, TMCCT, TDCR
Timer registers are accessed using MSR_PROT VMGEXIT calls and not from the
guest APIC backing page.
9. LVT* registers
LVT registers state is accessed from KVM vAPIC state for the vCPU.
Idle HLT Intercept
-------------------
As KVM does not have access to the APIC IRR state for a Secure AVIC guest,
idle HLT intercept feature should be always enabled for a Secure AVIC
guest. Otherwise, any pending interrupts in vAPIC IRR during HLT VMEXIT
would not be serviced and the vCPU could get stuck in HLT until the next
wakeup event (which could arrive after non-deterministic amount of time).
For idle HLT intercept to work vAPIC TPR value should not block the
pending interrupts.
LAPIC Timer Support
-------------------
LAPIC timer is emulated by KVM. So, APIC_LVTT, APIC_TMICT and APIC_TDCR,
APIC_TMCCT APIC registers are not read/written to the guest APIC backing
page and are communicated to KVM using MSR_PROT VMGEXIT.
IPI Support
-----------
Only SELF_IPI is accelerated by Secure AVIC hardware. Other IPI
destination shorthands result in VMEXIT_AVIC_INCOMPLETE_IPI #VC exception.
The expected guest handling for VMEXIT_AVIC_INCOMPLETE_IPI is:
- For interrupts, update APIC_IRR in target vCPUs' guest APIC backing
page.
- For NMIs, update NMI_REQUEST in target vCPUs' guest backing page.
- ICR based SMI, INIT, SIPI requests are not supported.
- After updating the target vCPU's guest APIC backing page, source vCPU
does a MSR_PROT VMGEXIT.
- KVM either wakes up the non-running target vCPU or sends an AVIC doorbell.
Exceptions Injection
--------------------
Secure AVIC does not support event injection for guests with Secure AVIC
enabled in SEV_FEATURES. So, KVM cannot inject exceptions to Secure AVIC
guests. Hardware takes care of reinjecting an interrupted exception (for
example due to NPF) on next VMRUN. #VC exception is not reinjected. KVM
clears all exception intercepts for the Secure AVIC guest.
Interrupt Injection
-------------------
IOAPIC and MSI based device interrupts can be injected by KVM. The
interrupt flow for this is:
- IOAPIC/MSI interrupts are updated in KVM's APIC_IRR state via
kvm_irq_delivery_to_apic().
- in ->inject_irq() callback, all interrupts which are set in KVM's
APIC_IRR are copied to RequestedIRR VMCB field and UpdateIRR bit is
set.
- VMRUN moves the current value of RequestedIRR to APIC_IRR in the
guest APIC backing page and clears RequestedIRR, UpdateIRR.
Given that hardware clearing of RequestedIRR and UpdateIRR can race with
KVM's writes to these fields, above interrupt injection flow ensures
that all RequestedIRR and UpdateIRR writes are done from the same CPU
where the vCPU is run.
As interrupt delivery to a vCPU is managed by hardware, interrupt window
is not applicable for Secure AVIC guests and interrupts are always
allowed to be injected.
PIC interrupts
--------------
Legacy PIC interrupts cannot be injected as they require event_inj or
VINTR injection support. Both of these cannot be done for Secure
AVIC guest.
PIT
---
PIT Reinject mode is not supported for edge-triggered interrupts, as it
requires IRQ ack notification on EOI. As EOI is accelerated by Secure
AVIC hardware for edge- triggered interrupts, IRQ ack notification is
not called for them.
NMI Injection
-------------
NMI injection requires ALLOWED_NMI to be set in Secure AVIC control MSR
by the guest. Only VNMI injection is allowed.
Design Caveats, Open Points and Improvement Opportunities
---------------------------------------------------------
- Current code uses KVM's vAPIC APIC_IRR for storing the interrupts which
need to be injected to the guest. It then reuses the exiting KVM's
interrupt injection flow (with some modifications to the injectable
interrupt determination).
While functional, this approach conflates the state of KVM's
software-emulated vAPIC with the state of the hardware-accelerated Secure
AVIC. This can make the code harder to reason about. A cleaner approach
could be desired here which would introduce a dedicated struct for
holding SAVIC-specific state, completely decoupling it from the software
lapic state and avoiding this overload of semantics.
In addition, preserving the existing notion of a boolean
guest_apic_protected instead of having to subcategorize it based on the
interrupt injection flow would be desired. Given that KVM cannot use the
TDX's PI (asynchronous interrupt injection) mechanism for SAVIC and must
instead adopt the pre-VMRUN injection model of writing to the
guest-visible backing page, this would require creating a separate flow
for moving the KVM's pending interrupts for the vCPU to the RequestedIRR
field.
- EOI handling for level-triggered interrupts uses KVM's unused vAPIC
APIC_ISR regs for tracking pending level interrupts. KVM uses its
APIC_TMR state to determine level-triggered interrupts. As KVM's
APIC_TMR is updated from IOAPIC redirection tables, the TMR information
should be accurate and match the guest vAPIC state.
This can be cleaned up to not use KVM's vAPIC APIC_ISR state and
maintain the state within sev code.
- RTC_GSI requires pending EOI information to detect coalesced interrupts.
As RTC_GSI is edge triggered, Secure AVIC does not forward EOI write to
KVM for this interrupt. In addition, APIC_IRR and APIC_ISR states are
not visible to KVM and are part of the guest APIC backing page. Approach
taken in this series is to disable checking of coalesced RTC_GSI
interrupts for Secure AVIC, which could impact userspace code which
relies on detecting RT_GSI interrupt coalescing.
Alternate approach would be to not support in-kernel IOAPIC emulation for
Secure AVIC guests, similar to TDX.
- As exceptions cannot be injected by KVM, a more detailed examination
of which exception intercepts need to be allowed for Secure AVIC
guests is required.
- As KVM does not have access to the guest's APIC_IRR and APIC_ISR
states, kvm_apic_pending_eoi() does not return correct information.
- External interrupts (PIC) are not supported. This breaks KVM's PIC
emulation.
- PIT reinject mode is not supported.
Changes since v1:
v1: https://lore.kernel.org/lkml/20250228085115.105648-1-Neeraj.Upadhyay@amd.com/
- Rebased and resolved conflicts with the latest kvm next snapshot.
- Replaced enum with a separate lapic struct member to differentiate
protected APIC's interrupt injection mechanism.
- Add a patch to disable KVM_FEATURE_PV_EOI and KVM_FEATURE_PV_SEND_IPI
for protected APIC guests.
- Dropped SPIV hack patch, which always returns true from
kvm_apic_sw_enabled(): 20250228085115.105648-16-Neeraj.Upadhyay@....com
Instead of this, rely on guest propagating APIC_SPIV value to KVM.
- Updates the the commit logs and cover letter to provide more
description.
This series is based on top of commit a6ad54137af9 ("Merge branch
'guest-memfd-mmap' into HEAD") and is based on
git.kernel.org/pub/scm/virt/kvm/kvm.git next
Git tree is available at:
https://github.com/AMDESE/linux-kvm/tree/savic-host-latest
In addition, below patch from v1 is required, until SAVIC guest is
updated to propagate APIC_SPIV to the hypervisor.
20250228085115.105648-16-Neeraj.Upadhyay@....com
Qemu tree is at:
https://github.com/AMDESE/qemu/tree/secure-avic
QEMU commandline for testing Secure AVIC enabled guest:
qemu-system-x86_64 <...> -object sev-snp-guest,id=sev0,policy=0xb0000,cbitpos=51,reduced-phys-bits=1,allowed-sev-features=true,secure-avic=true
Guest Support is present in tip/tip master branch at the commit snapshot
835794d1ae4c ("Merge branch into tip/master: 'x86/tdx'").
Kishon Vijay Abraham I (2):
KVM: SVM: Do not inject exception for Secure AVIC
KVM: SVM: Set VGIF in VMSA area for Secure AVIC guests
Neeraj Upadhyay (15):
KVM: x86/lapic: Differentiate protected APIC interrupt mechanisms
x86/cpufeatures: Add Secure AVIC CPU feature
KVM: SVM: Add support for Secure AVIC capability in KVM
KVM: SVM: Set guest APIC protection flags for Secure AVIC
KVM: SVM: Do not intercept SECURE_AVIC_CONTROL MSR for SAVIC guests
KVM: SVM: Implement interrupt injection for Secure AVIC
KVM: SVM: Add IPI Delivery Support for Secure AVIC
KVM: SVM: Do not intercept exceptions for Secure AVIC guests
KVM: SVM: Enable NMI support for Secure AVIC guests
KVM: SVM: Add VMGEXIT handler for Secure AVIC backing page
KVM: SVM: Add IOAPIC EOI support for Secure AVIC guests
KVM: x86/ioapic: Disable RTC EOI tracking for protected APIC guests
KVM: SVM: Check injected timers for Secure AVIC guests
KVM: x86/cpuid: Disable paravirt APIC features for protected APIC
KVM: SVM: Advertise Secure AVIC support for SNP guests
arch/x86/include/asm/cpufeatures.h | 1 +
arch/x86/include/asm/msr-index.h | 1 +
arch/x86/include/asm/svm.h | 9 +-
arch/x86/include/uapi/asm/svm.h | 3 +
arch/x86/kvm/cpuid.c | 4 +
arch/x86/kvm/ioapic.c | 8 +-
arch/x86/kvm/lapic.c | 17 +-
arch/x86/kvm/lapic.h | 5 +-
arch/x86/kvm/svm/sev.c | 367 ++++++++++++++++++++++++++++-
arch/x86/kvm/svm/svm.c | 80 +++++--
arch/x86/kvm/svm/svm.h | 14 ++
arch/x86/kvm/x86.c | 15 +-
12 files changed, 493 insertions(+), 31 deletions(-)
base-commit: a6ad54137af92535cfe32e19e5f3bc1bb7dbd383
--
2.34.1
Powered by blists - more mailing lists