[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250228085115.105648-1-Neeraj.Upadhyay@amd.com>
Date: Fri, 28 Feb 2025 14:20:56 +0530
From: Neeraj Upadhyay <Neeraj.Upadhyay@....com>
To: <seanjc@...gle.com>, <pbonzini@...hat.com>, <kvm@...r.kernel.org>
CC: <linux-kernel@...r.kernel.org>, <bp@...en8.de>, <tglx@...utronix.de>,
<mingo@...hat.com>, <dave.hansen@...ux.intel.com>, <Thomas.Lendacky@....com>,
<nikunj@....com>, <Santosh.Shukla@....com>, <Vasant.Hegde@....com>,
<Suravee.Suthikulpanit@....com>, <David.Kaplan@....com>, <x86@...nel.org>,
<hpa@...or.com>, <peterz@...radead.org>, <huibo.wang@....com>,
<naveen.rao@....com>, <binbin.wu@...ux.intel.com>, <isaku.yamahata@...el.com>
Subject: [RFC PATCH 00/19] AMD: Add Secure AVIC KVM Support
Introduction
------------
Secure AVIC is a new hardware feature in the AMD64 architecture to
allow SEV-SNP guests to prevent hypervisor from generating unexpected
interrupts to a vCPU or otherwise violate architectural assumptions
around APIC behavior.
One of the significant differences from AVIC or emulated x2APIC is that
Secure AVIC uses a guest-owned and managed APIC backing page. It also
introduces additional fields in both the VMCB and the Secure AVIC backing
page to aid the guest in limiting which interrupt vectors can be injected
into the guest.
Guest APIC Backing Page
-----------------------
Each vCPU has a guest-allocated APIC backing page of size 4K, which
maintains APIC state for that vCPU. The x2APIC MSRs are mapped at
their corresposing x2APIC MMIO offset within the guest APIC backing
page. All x2APIC accesses by guest or Secure AVIC hardware operate
on this backing page. The backing page should be pinned and NPT entry
for it should be always mapped while the corresponding vCPU is running.
MSR Accesses
------------
Secure AVIC only supports x2APIC MSR accesses. xAPIC MMIO offset based
accesses are not supported.
Some of the MSR writes such as ICR writes (with shorthand equal to
self), SELF_IPI, EOI, TPR writes are accelerated by Secure AVIC
hardware. Other MSR writes generate a #VC exception (
VMEXIT_AVIC_NOACCEL or VMEXIT_AVIC_INCOMPLETE_IPI). The #VC
exception handler reads/writes to the guest APIC backing page.
As guest APIC backing page is accessible to the guest, guest can
optimize APIC register access by directly reading/writing to the
guest APIC backing page (instead of taking the #VC exception route).
APIC msr reads are accelerated similar to AVIC, as described in
table "15-22. Guest vAPIC Register Access Behavior" of APM.
In addition to the architected MSRs, following new fields are added to
the guest APIC backing page which can be modified directly by the
guest:
a. ALLOWED_IRR
ALLOWED_IRR vector indicates the interrupt vectors which the guest
allows the hypervisor to send. The combination of host-controlled
REQUESTED_IRR vectors (part of VMCB) and ALLOWED_IRR is used by
hardware to update the IRR vectors of the Guest APIC backing page.
#Offset #bits Description
204h 31:0 Guest allowed vectors 0-31
214h 31:0 Guest allowed vectors 32-63
...
274h 31:0 Guest allowed vectors 224-255
ALLOWED_IRR is meant to be used specifically for vectors that the
hypervisor emulates and is allowed to inject, such as IOAPIC/MSI
device interrupts. Interrupt vectors used exclusively by the guest
itself (like IPI vectors) should not be allowed to be injected into
the guest for security reasons.
b. NMI Request
#Offset #bits Description
278h 0 Set by Guest to request Virtual NMI
Guest can set NMI_REQUEST to trigger APIC_ICR based NMIs.
APIC Registers
--------------
1. APIC ID
APIC_ID values is set by KVM and similar to x2apic, it is equal to
vcpu_id for a vCPU.
2. APIC LVR
APIC Version register is expected to be read from KVM's APIC state using
MSR_PROT rdmsr VMGEXIT and updated in guest APIC backing page.
3. APIC TPR
TPR writes are accelerated and not communicated to KVM. So,
hypervisor does not have information about TPR value for a vCPU.
4. APIC PPR
Current state of PPR is not visible to KVM.
5. APIC SPIV
Spurious Interrupt Vector register value is not communicated to KVM.
6. APIC IRR and ISR
IRR and ISR states are visible only to guest. So, KVM cannot use these
registers to determine interrupt which are pending completion.
7. APIC TMR
Trigger Mode Register state is owned by guest and not visible to
KVM.
8. Timer registers - TMICT, TMCCT, TDCR
Timer registers are accessed using MSR_PROT VMGEXIT calls and not from
the guest APIC backing page.
9. LVT* registers
LVT registers state is accessed from KVM APIC state for the vCPU.
Idle halt Intercept
-------------------
As hypervisor does not have access to the APIC IRR state for a Secure
AVIC guest, idle halt intercept feature should be always enabled for
a Secure AVIC guest. Otherwise, any pending interrupts in APIC IRR during
halt vmexit would not be serviced and vCPU could get stuck in halt forever.
For idle halt intercept to work APIC TPR value should not block the
pending interrupts.
LAPIC Timer Support
-------------------
LAPIC timer is emulated by KVM. So, APIC_LVTT, APIC_TMICT and APIC_TDCR,
APIC_TMCCT APIC registers are not read/written to the guest APIC backing
page and are communicated to the hypervisor using MSR_PROT VMGEXIT.
IPI Support
-----------
Only SELF_IPI is accelerated by Secure AVIC hardware. Other IPI
destination shorthands result in VMEXIT_AVIC_INCOMPLETE_IPI #VC exception.
The expected guest handling for VMEXIT_AVIC_INCOMPLETE_IPI is:
- For interrupts, update APIC_IRR in target vCPUs' guest APIC backing
page.
- For NMIs, update NMI_REQUEST in target vCPUs' guest backing
page.
- ICR based SMI, INIT, SIPI requests are not supported.
- After updating the target vCPU's guest APIC backing page, source vCPU
does a MSR_PROT VMGEXIT.
- KVM either wakes up the non-running target vCPU or sends a
AVIC doorbell.
Exceptions Injection
--------------------
Secure AVIC does not support event injection for guests with Secure AVIC
enabled in SEV_FEATURES. So, KVM cannot inject exceptions to Secure AVIC
guests. Hardware takes care of reinjecting an interrupted exception (for
example due to NPF) raised in guest on next VMRUN. VC exception is not
reinjected. KVM clears all exception intercepts for Secure AVIC guest.
Interrupt Injection
-------------------
IOAPIC and MSI based device interrupts can be injected by KVM. The
interrupt flow for this is:
- IOAPIC/MSI interrupts are updated in KVM's APIC_IRR state via
kvm_irq_delivery_to_apic().
- in ->inject_irq() callback, all interrupts which are set in KVM's
APIC_IRR are copied to RequestedIRR VMCB field and UpdateIRR bit is
set.
- VMENTER moves the current value of RequestedIRR to APIC_IRR in
guest APIC backing page and clears UpdateIRR.
Given that hardware clearing of RequestedIRR and UpdateIRR can race
with software writes to these fields, above interrupt injection
flow ensures that all RequestedIRR and UpdateIRR writes are done
from the same CPU where vCPU is run.
As interrupt delivery to vCPU is managed by hardware, interrupt window
is not applicable for Secure AVIC guests and interrupts are always
allowed to be injected.
PIC interrupts
--------------
Legacy PIC interrupts cannot be injected as they required event_inj or
VINTR injection support. Both of these are cannot be done for Secure
AVIC guest.
PIT
---
PIT Reinject mode is not supported as it requires IRQ ack notification
on EOI. As EOI is accelerated for edge interrupts, IRQ ack notification
is not called for those interrupts.
NMI Injection
-------------
NMI injection requires ALLOWED_NMI to be set in Secure AVIC control
msr by the guest. Only VNMI injection is allowed.
Open Points
-----------
- RTC_GSI requires pending EOI information to detect coalesced
interrupts. As RTC_GSI is edge triggered, Secure AVIC does not
forward EOI write to KVM for this interrupt. In addition, APIC_IRR
and APIC_ISR states are not visible to KVM and are part of guest
APIC backing page. Approach taken in this series is to disable
checking of coalesced RTC_GSI interrupts for Secure AVIC, which
could impact userspace.
- EOI handling for level interrupts uses KVM's unused APIC_ISR regs
for tracking pending level interrupts. KVM uses its APIC_TMR state
to determine level-triggered interrupts. As KVM's APIC_TMR is
updated from IOAPIC redirect tables, the TMR information should be
accurate and match guest APIC state. This can be cleaned up later
to not use KVM's APIC_ISR state and maintained within sev code.
- Spurious Interrupt Vector Register writes are not visible to KVM.
So, KVM cannot determine if the SW enabled bit is set.
- As exceptions cannot be injected by KVM, a more detailed examination
of which intercepts need to be allowed for Secure AVIC guests is
required.
- As KVM does not have access to the guest's APIC_IRR and APIC_ISR
states, kvm_apic_pending_eoi() does not return correct information.
- External interrupts (PIC) are not supported. This breaks KVM's PIC
emulation.
- PIT reinject mode is not supported.
- Current code uses KVM's vCPU APIC_IRR for interrupts which
need to be injected to guest. Another approach could be to
maintain pending interrupts within sev code and inject using
flow similar to posted interrupts.
This series is based on top of commit f7bafceba76e ("KVM: remove
kvm_arch_post_init_vm ") and is based on
git.kernel.org/pub/scm/virt/kvm/kvm.git next
Git tree is available at:
https://github.com/AMDESE/linux-kvm/tree/savic-host-latest
Qemu tree is at:
https://github.com/AMDESE/qemu/tree/secure-avic
Guest Secure AVIC support is available at:
https://lore.kernel.org/lkml/20250226090525.231882-1-Neeraj.Upadhyay@amd.com/
This series depends on below patch series:
1. Idle Halt Intercept
https://lore.kernel.org/all/20250128124812.7324-1-manali.shukla@amd.com/
2. ALLOWED_SEV_FEATURES support
https://lore.kernel.org/kvm/20250207233410.130813-1-kim.phillips@amd.com/
Kishon Vijay Abraham I (5):
KVM: SEV: Do not intercept SECURE_AVIC_CONTROL MSR
KVM: SVM: Secure AVIC: Do not inject "Exceptions" for Secure AVIC
KVM: SVM/SEV: Secure AVIC: Set VGIF in VMSA area
KVM: SVM/SEV: Secure AVIC: Enable NMI support
KVM: x86: Secure AVIC: Indicate APIC is enabled by guest SW _always_
Neeraj Upadhyay (12):
KVM: x86: Convert guest_apic_protected bool to an enum type
x86/cpufeatures: Add Secure AVIC CPU Feature
KVM: SVM: Add support for Secure AVIC capability in KVM
KVM: SVM: Initialize apic protected state for SAVIC guests
KVM: SVM/SEV/X86: Secure AVIC: Add support to inject interrupts
KVM: SVM/SEV/X86: Secure AVIC: Add hypervisor side IPI Delivery
Support
KVM: SVM/SEV: Do not intercept exceptions for Secure AVIC guest
KVM: SVM/SEV: Add SVM_VMGEXIT_SECURE_AVIC GHCB protocol event handling
KVM: x86: Secure AVIC: Add IOAPIC EOI support for level interrupts
KVM: x86/ioapic: Disable RTC_GSI EOI tracking for protected APIC
X86: SVM: Check injected vectors before waiting for timer expiry
KVM: SVM/SEV: Allow creating VMs with Secure AVIC enabled
Sean Christopherson (2):
KVM: TDX: Add support for find pending IRQ in a protected local APIC
KVM: x86: Assume timer IRQ was injected if APIC state is protected
arch/x86/include/asm/cpufeatures.h | 1 +
arch/x86/include/asm/kvm-x86-ops.h | 1 +
arch/x86/include/asm/kvm_host.h | 1 +
arch/x86/include/asm/msr-index.h | 2 +
arch/x86/include/asm/svm.h | 9 +-
arch/x86/include/uapi/asm/svm.h | 3 +
arch/x86/kvm/ioapic.c | 8 +-
arch/x86/kvm/irq.c | 6 +
arch/x86/kvm/lapic.c | 23 +-
arch/x86/kvm/lapic.h | 16 ++
arch/x86/kvm/svm/sev.c | 371 +++++++++++++++++++++++++++++
arch/x86/kvm/svm/svm.c | 79 ++++--
arch/x86/kvm/svm/svm.h | 17 +-
arch/x86/kvm/x86.c | 12 +-
14 files changed, 518 insertions(+), 31 deletions(-)
base-commit: f7bafceba76e9ab475b413578c1757ee18c3e44b
--
2.34.1
Powered by blists - more mailing lists