lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250923050317.205482-1-Neeraj.Upadhyay@amd.com>
Date: Tue, 23 Sep 2025 10:33:00 +0530
From: Neeraj Upadhyay <Neeraj.Upadhyay@....com>
To: <kvm@...r.kernel.org>, <seanjc@...gle.com>, <pbonzini@...hat.com>
CC: <linux-kernel@...r.kernel.org>, <Thomas.Lendacky@....com>,
	<nikunj@....com>, <Santosh.Shukla@....com>, <Vasant.Hegde@....com>,
	<Suravee.Suthikulpanit@....com>, <bp@...en8.de>, <David.Kaplan@....com>,
	<huibo.wang@....com>, <naveen.rao@....com>, <tiala@...rosoft.com>
Subject: [RFC PATCH v2 00/17] AMD: Add Secure AVIC KVM Support

Introduction
------------

Secure AVIC is a new hardware feature in the AMD64 architecture to
allow SEV-SNP guests to prevent hypervisor from generating unexpected
interrupts to a vCPU or otherwise violate architectural assumptions
around APIC behavior.

One of the significant differences from AVIC or emulated x2APIC is that
Secure AVIC uses a guest-owned and managed APIC backing page. It also
introduces additional fields in both the VMCB and the Secure AVIC backing
page to aid the guest in limiting which interrupt vectors can be injected
into the guest.

Guest APIC Backing Page
-----------------------
Each vCPU has a guest-allocated APIC backing page, which maintains APIC
state for that vCPU. The x2APIC MSRs are mapped at their corresposing
x2APIC MMIO offset within the guest APIC backing page. All x2APIC accesses
by guest or Secure AVIC hardware operate on this backing page. The
backing page should be pinned and NPT entry for it should be always
mapped while the corresponding vCPU is running.

MSR Accesses
------------
Secure AVIC only supports x2APIC MSR accesses. xAPIC MMIO offset based
accesses are not supported.

Some of the MSR writes such as ICR writes (with shorthand equal to
self), SELF_IPI, EOI, TPR writes are accelerated by Secure AVIC
hardware. Other MSR writes generate a #VC exception (
VMEXIT_AVIC_NOACCEL or VMEXIT_AVIC_INCOMPLETE_IPI). The #VC
exception handler reads/writes to the guest APIC backing page.
As guest APIC backing page is accessible to the guest, guest can
optimize APIC register access by directly reading/writing to the
guest APIC backing page (instead of taking the #VC exception route).
APIC MSR reads are accelerated similar to AVIC, as described in
table "15-22. Guest vAPIC Register Access Behavior" of APM.

In addition to the architected MSRs, following new fields are added to
the guest APIC backing page which can be modified directly by the
guest:

a. ALLOWED_IRR

ALLOWED_IRR vector indicates the interrupt vectors which the guest
allows the hypervisor to send. The combination of host-controlled
REQUESTED_IRR vectors (part of VMCB) and guest-controlled ALLOWED_IRR
is used by hardware to update the IRR vectors of the Guest APIC
backing page.

#Offset        #bits        Description
204h           31:0         Guest allowed vectors 0-31
214h           31:0         Guest allowed vectors 32-63
...
274h           31:0         Guest allowed vectors 224-255

ALLOWED_IRR is meant to be used specifically for vectors that the
hypervisor emulates and is allowed to inject, such as IOAPIC/MSI
device interrupts.  Interrupt vectors used exclusively by the guest
itself (like IPI vectors) should not be allowed to be injected into
the guest for security reasons.

b. NMI Request
 
#Offset        #bits        Description
278h           0            Set by Guest to request Virtual NMI

Guest can set NMI_REQUEST to trigger APIC_ICR based NMIs.

APIC Registers
--------------

1. APIC ID

APIC_ID values is set by KVM and similar to x2apic, it is equal to
vcpu_id for a vCPU.

2. APIC LVR

APIC Version register is expected to be read from KVM's APIC state using
MSR_PROT RDMSR VMGEXIT and updated in the guest APIC backing page.

3. APIC TPR

TPR writes are accelerated and not communicated to KVM. So, the
hypervisor does not have information about TPR value for a vCPU.

4. APIC PPR

Current state of PPR is not visible to KVM.

5. APIC SPIV

Spurious Interrupt Vector register value is communicated by the guest to
the KVM.

6. APIC IRR and APIC ISR

IRR and ISR states are visible only to the guest. So, KVM cannot use these
registers to determine guest interrupts which are pending completion.

7. APIC TMR

Trigger Mode Register state is owned by the guest and not visible to KVM.
However, for IOAPIC external interrupts, KVM's software vAPIC trigger
mode is set from the guest-controlled redirection table. So, the APIC_TMR
values in the software vAPIC state can be used to identify between edge
and level triggered IOAPIC interrupts.

8. Timer registers - TMICT, TMCCT, TDCR

Timer registers are accessed using MSR_PROT VMGEXIT calls and not from the
guest APIC backing page.

9. LVT* registers

LVT registers state is accessed from KVM vAPIC state for the vCPU.

Idle HLT Intercept
-------------------

As KVM does not have access to the APIC IRR state for a Secure AVIC guest,
idle HLT intercept feature should be always enabled for a Secure AVIC
guest. Otherwise, any pending interrupts in vAPIC IRR during HLT VMEXIT
would not be serviced and the vCPU could get stuck in HLT until the next
wakeup event (which could arrive after non-deterministic amount of time).
For idle HLT intercept to work vAPIC TPR value should not block the
pending interrupts.

LAPIC Timer Support
-------------------
LAPIC timer is emulated by KVM. So, APIC_LVTT, APIC_TMICT and APIC_TDCR,
APIC_TMCCT APIC registers are not read/written to the guest APIC backing
page and are communicated to KVM using MSR_PROT VMGEXIT. 

IPI Support
-----------
Only SELF_IPI is accelerated by Secure AVIC hardware. Other IPI
destination shorthands result in VMEXIT_AVIC_INCOMPLETE_IPI #VC exception.
The expected guest handling for VMEXIT_AVIC_INCOMPLETE_IPI is:

- For interrupts, update APIC_IRR in target vCPUs' guest APIC backing
  page.

- For NMIs, update NMI_REQUEST in target vCPUs' guest backing page.

- ICR based SMI, INIT, SIPI requests are not supported.

- After updating the target vCPU's guest APIC backing page, source vCPU
  does a MSR_PROT VMGEXIT.

- KVM either wakes up the non-running target vCPU or sends an AVIC doorbell.

Exceptions Injection
--------------------

Secure AVIC does not support event injection for guests with Secure AVIC
enabled in SEV_FEATURES. So, KVM cannot inject exceptions to Secure AVIC
guests. Hardware takes care of reinjecting an interrupted exception (for
example due to NPF) on next VMRUN. #VC exception is not reinjected. KVM
clears all exception intercepts for the Secure AVIC guest.

Interrupt Injection
-------------------

IOAPIC and MSI based device interrupts can be injected by KVM. The
interrupt flow for this is:

- IOAPIC/MSI interrupts are updated in KVM's APIC_IRR state via
  kvm_irq_delivery_to_apic().
- in ->inject_irq() callback, all interrupts which are set in KVM's
  APIC_IRR are copied to RequestedIRR VMCB field and UpdateIRR bit is
  set.
- VMRUN moves the current value of RequestedIRR to APIC_IRR in the
  guest APIC backing page and clears RequestedIRR, UpdateIRR.

Given that hardware clearing of RequestedIRR and UpdateIRR can race with
KVM's writes to these fields, above interrupt injection flow ensures
that all RequestedIRR and UpdateIRR writes are done from the same CPU
where the vCPU is run.

As interrupt delivery to a vCPU is managed by hardware, interrupt window
is not applicable for Secure AVIC guests and interrupts are always
allowed to be injected.

PIC interrupts
--------------

Legacy PIC interrupts cannot be injected as they require event_inj or
VINTR injection support. Both of these cannot be done for Secure
AVIC guest.

PIT
---

PIT Reinject mode is not supported for edge-triggered interrupts, as it
requires IRQ ack notification on EOI. As EOI is accelerated by Secure
AVIC hardware for edge- triggered interrupts, IRQ ack notification is
not called for them.

NMI Injection
-------------

NMI injection requires ALLOWED_NMI to be set in Secure AVIC control MSR
by the guest. Only VNMI injection is allowed.

Design Caveats, Open Points and Improvement Opportunities
---------------------------------------------------------

- Current code uses KVM's vAPIC APIC_IRR for storing the interrupts which
  need to be injected to the guest. It then reuses the exiting KVM's
  interrupt injection flow (with some modifications to the injectable
  interrupt determination).
  
  While functional, this approach conflates the state of KVM's
  software-emulated vAPIC with the state of the hardware-accelerated Secure
  AVIC. This can make the code harder to reason about. A cleaner approach
  could be desired here which would introduce a dedicated struct for
  holding SAVIC-specific state, completely decoupling it from the software
  lapic state and avoiding this overload of semantics.
  
  In addition, preserving the existing notion of a boolean
  guest_apic_protected instead of having to subcategorize it based on the
  interrupt injection flow would be desired. Given that KVM cannot use the
  TDX's PI (asynchronous interrupt injection) mechanism for SAVIC and must
  instead adopt the pre-VMRUN injection model of writing to the
  guest-visible backing page, this would require creating a separate flow
  for moving the KVM's pending interrupts for the vCPU to the RequestedIRR
  field.

- EOI handling for level-triggered interrupts uses KVM's unused vAPIC
  APIC_ISR regs for tracking pending level interrupts. KVM uses its
  APIC_TMR state to determine level-triggered interrupts. As KVM's
  APIC_TMR is updated from IOAPIC redirection tables, the TMR information
  should be accurate and match the guest vAPIC state.
  
  This can be cleaned up to not use KVM's vAPIC APIC_ISR state and 
  maintain the state within sev code.

- RTC_GSI requires pending EOI information to detect coalesced interrupts.
  As RTC_GSI is edge triggered, Secure AVIC does not forward EOI write to
  KVM for this interrupt. In addition, APIC_IRR and APIC_ISR states are
  not visible to KVM and are part of the guest APIC backing page. Approach
  taken in this series is to disable checking of coalesced RTC_GSI
  interrupts for Secure AVIC, which could impact userspace code which
  relies on detecting RT_GSI interrupt coalescing.
  
  Alternate approach would be to not support in-kernel IOAPIC emulation for
  Secure AVIC guests, similar to TDX.

- As exceptions cannot be injected by KVM, a more detailed examination
  of which exception intercepts need to be allowed for Secure AVIC
  guests is required.

- As KVM does not have access to the guest's APIC_IRR and APIC_ISR
  states, kvm_apic_pending_eoi() does not return correct information.

- External interrupts (PIC) are not supported. This breaks KVM's PIC
  emulation.

- PIT reinject mode is not supported.

Changes since v1:

v1: https://lore.kernel.org/lkml/20250228085115.105648-1-Neeraj.Upadhyay@amd.com/

- Rebased and resolved conflicts with the latest kvm next snapshot.
- Replaced enum with a separate lapic struct member to differentiate
  protected APIC's interrupt injection mechanism.
- Add a patch to disable KVM_FEATURE_PV_EOI and KVM_FEATURE_PV_SEND_IPI
  for protected APIC guests.
- Dropped SPIV hack patch, which always returns true from
  kvm_apic_sw_enabled():   20250228085115.105648-16-Neeraj.Upadhyay@....com
  Instead of this, rely on guest propagating APIC_SPIV value to KVM.
- Updates the the commit logs and cover letter to provide more
  description.

This series is based on top of commit a6ad54137af9 ("Merge branch
'guest-memfd-mmap' into HEAD") and is based on

  git.kernel.org/pub/scm/virt/kvm/kvm.git next

Git tree is available at:

  https://github.com/AMDESE/linux-kvm/tree/savic-host-latest

In addition, below patch from v1 is required, until SAVIC guest is
updated to propagate APIC_SPIV to the hypervisor.

  20250228085115.105648-16-Neeraj.Upadhyay@....com

Qemu tree is at:
  https://github.com/AMDESE/qemu/tree/secure-avic
  
QEMU commandline for testing Secure AVIC enabled guest:

qemu-system-x86_64 <...> -object sev-snp-guest,id=sev0,policy=0xb0000,cbitpos=51,reduced-phys-bits=1,allowed-sev-features=true,secure-avic=true

Guest Support is present in tip/tip master branch at the commit snapshot
835794d1ae4c ("Merge branch into tip/master: 'x86/tdx'").

Kishon Vijay Abraham I (2):
  KVM: SVM: Do not inject exception for Secure AVIC
  KVM: SVM: Set VGIF in VMSA area for Secure AVIC guests

Neeraj Upadhyay (15):
  KVM: x86/lapic: Differentiate protected APIC interrupt mechanisms
  x86/cpufeatures: Add Secure AVIC CPU feature
  KVM: SVM: Add support for Secure AVIC capability in KVM
  KVM: SVM: Set guest APIC protection flags for Secure AVIC
  KVM: SVM: Do not intercept SECURE_AVIC_CONTROL MSR for SAVIC guests
  KVM: SVM: Implement interrupt injection for Secure AVIC
  KVM: SVM: Add IPI Delivery Support for Secure AVIC
  KVM: SVM: Do not intercept exceptions for Secure AVIC guests
  KVM: SVM: Enable NMI support for Secure AVIC guests
  KVM: SVM: Add VMGEXIT handler for Secure AVIC backing page
  KVM: SVM: Add IOAPIC EOI support for Secure AVIC guests
  KVM: x86/ioapic: Disable RTC EOI tracking for protected APIC guests
  KVM: SVM: Check injected timers for Secure AVIC guests
  KVM: x86/cpuid: Disable paravirt APIC features for protected APIC
  KVM: SVM: Advertise Secure AVIC support for SNP guests

 arch/x86/include/asm/cpufeatures.h |   1 +
 arch/x86/include/asm/msr-index.h   |   1 +
 arch/x86/include/asm/svm.h         |   9 +-
 arch/x86/include/uapi/asm/svm.h    |   3 +
 arch/x86/kvm/cpuid.c               |   4 +
 arch/x86/kvm/ioapic.c              |   8 +-
 arch/x86/kvm/lapic.c               |  17 +-
 arch/x86/kvm/lapic.h               |   5 +-
 arch/x86/kvm/svm/sev.c             | 367 ++++++++++++++++++++++++++++-
 arch/x86/kvm/svm/svm.c             |  80 +++++--
 arch/x86/kvm/svm/svm.h             |  14 ++
 arch/x86/kvm/x86.c                 |  15 +-
 12 files changed, 493 insertions(+), 31 deletions(-)


base-commit: a6ad54137af92535cfe32e19e5f3bc1bb7dbd383
-- 
2.34.1


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ