[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250928190624.3735830-1-skhawaja@google.com>
Date: Sun, 28 Sep 2025 19:06:08 +0000
From: Samiullah Khawaja <skhawaja@...gle.com>
To: David Woodhouse <dwmw2@...radead.org>, Lu Baolu <baolu.lu@...ux.intel.com>,
Joerg Roedel <joro@...tes.org>, Will Deacon <will@...nel.org>,
Pasha Tatashin <pasha.tatashin@...een.com>, Jason Gunthorpe <jgg@...pe.ca>, iommu@...ts.linux.dev
Cc: Samiullah Khawaja <skhawaja@...gle.com>, Robin Murphy <robin.murphy@....com>,
Pratyush Yadav <pratyush@...nel.org>, Kevin Tian <kevin.tian@...el.com>, linux-kernel@...r.kernel.org,
Saeed Mahameed <saeedm@...dia.com>, Adithya Jayachandran <ajayachandra@...dia.com>,
Parav Pandit <parav@...dia.com>, Leon Romanovsky <leonro@...dia.com>, William Tu <witu@...dia.com>,
Vipin Sharma <vipinsh@...gle.com>, dmatlack@...gle.com, zhuyifei@...gle.com,
Chris Li <chrisl@...nel.org>, praan@...gle.com
Subject: [RFC PATCH 00/15] iommu: Add live update state preservation
Hi,
This RFC patch series introduces a mechanism for IOMMU state
preservation across live update, using the Intel VT-d driver as the
initial example implementation and demonstration platform.
Please take a look at the following LWN article to learn about KHO and
Live Update Orchestrator:
https://lwn.net/Articles/1033364/
This work is based on the LUOv3 patch series listed below. Please find
the details of various live update states, file descriptor and subsystem
preservation callbacks, and memory preservation mechanisms in the LUOv3
series.
https://lore.kernel.org/all/20250807014442.3829950-1-pasha.tatashin@soleen.com/
The kernel tree with all dependencies is uploaded to the following
Github location:
https://github.com/googleprodkernel/linux-liveupdate/tree/iommu/rfc-v1
Overall Goals:
The goal of this effort is to preserve the IOMMU domains, of devices
marked for preservation, managed by iommufd. This allows DMA mappings
and IOMMU context of a device assigned to a VM to be maintained across
a live update.
This will be ultimately achieved by preserving IOMMU page tables, IOMMU
root table and the relevant context entries across live update.
Current Implementation, Scope and Limitations:
This RFC provides foundational mechanisms and demonstrates the
end-to-end workflow. It only implements the preservation of the minimum
IOMMU state, which includes the root table and context tables.
Specifically, it includes:
- Registration of the Intel VT-d IOMMU driver with the Live Update
Orchestrator.
- Registration of iommufd as a file handler with Live Update
Orchestrator.
- A subsystem-wide rw_semaphore to protect live update state and
operations.
- An API iommu_domain_preserve to preserve IOMMU domains for
preservation. Currently it only marks them as preserved.
- Implementation for preserving and restoring the Intel IOMMU root and
context tables.
- A selftest to validate the end-to-end preservation and restoration of
an iommufd file descriptor.
This version does not yet preserve the DMA mappings (page tables)
themselves. This means that ongoing DMA from a device will not continue
to work across the live update. This is a known limitation that will be
addressed in future work.
It is important to note that the preservation of the device state itself
is outside the scope of this series.
The series also does not yet include a versioning scheme for the
persisted state; this will be added later.
Target Architectural Overview:
The target architecture for IOMMU state preservation across a live
update involves coordination between the Live Update Orchestrator,
iommufd, and the IOMMU drivers.
The core design uses the Live Update Orchestrator's file descriptor
preservation mechanism to preserve iommufd file descriptors. During
preservation, the LUO prepare callback for an iommufd walks through the
IOMMU domains it manages to identify the ones associated with devices
marked for preservation. Once identified, Generic Page Table support
will be used to preserve the page tables of these domains. The domains
are then marked as preserved.
The Live Update Orchestrator's subsystem mechanism will be used to
preserve the IOMMU context entries and the associated root table.
It is important to note that the preservation of the device state is
outside the scope of this patch series. This series focuses solely on
the IOMMU subsystem's role in supporting live update for such preserved
devices.
Critical Design Considerations:
After a live update, we can restore the IOMMU domain using two
approaches,
1. Reuse the preserved page tables:
During boot the next kernel can prepare the new domain reusing the
existing preserved page tables and reattach the devices to it. The
restored domain can be retrieved and reclaimed when the iommufd file
descriptor is restored.
2. Hotswap a new domain on finish:
During boot the next kernel can setup domains for all the preserved
devices without updating context entries, so these devices can keep on
using the old preserved page tables. The userspace VMM can restore the
iommufd, create IOAS/HWPT, attach devices to it and setup DMA mappings.
Once Live Update Orchestrator moves to the finish state, the context
entries of the preserved devices can be updated and replaced with the
new IOMMU domains and page tables that are cooked in the new kernel.
I am inclined towards the "Hotswap" approach, as it involves restoring
the minimum state from the previous kernel and lets user space
regenerate the mappings. This provides a clean way of discarding the old
kernel state and using the new kernel data structures. I will share more
details on the specifics of this approach in future versions of this
series.
High-Level Sequence Flow:
The following diagrams illustrate the high-level interactions during the
preservation phase. The diagrams also contain parts that are not
implemented in this series.
Prepare:
Before live update the PREPARE event of Liveupdate Orchestrator invokes
callbacks of the registered file and subsystem handlers.
Userspace (VMM) | LUO | iommufd | IOMMU Core | Driver
-----------------|---------|-----------------|-----------------|--------
| | | |
Preserve iommufd | | | |
-----------------> | | |
| register| | |
<----------------- | | |
| | | |
| | | |
PREPARE | | | |
-----------------> | | |
| | | |
| Call FS | | |
| handle | | |
|---------> | |
| | Preserve Domain | |
| |-----------------> |
| | | Preserve using |
| | | Generic-Page |
| | | Tables |
| | |----------------->
| | | | Preserve
| | | | Domain
| | <------------------
| <------------------ |
| | Return phys | |
| save | Address of | |
<---------- state | |
| | | |
| | | |
| subsys | | |
| handle | | |
|--------------------------------------------->
| | | | Save iommu
| | | | state
| | | |
| | | | Return phys
| | | | Address of
| | | | state
| <------------------------------------
| save | | |
Restore:
After a live update, the preserved state is restored during boot and/or
when userspace retrieves the preserved FDs.
Userspace (VMM) | LUO | iommufd | IOMMU Core | Driver
-----------------|---------|-----------------|-----------------|--------
| | | | Init
| | | |
| | | | get phys
| | | | address
| <------------------------------------
| Return | | |
| addr | | |
| ------------------------------------>
| | | | Restore root
| | | | table
| | | |
Retrieve iommufd | | | |
-----------------> Call FS | | |
| handle | | |
|---------> | |
| | Restore | |
<---------- | |
| | | |
Attach IOAS | | | |
---------------------------> | |
| | Attach | |
| ------------------> |
| | | attach |
| | ------------------> Attach domain
| | | | w/o context
| | | | update
| | <------------------
<---------------------------- |
| | | |
| | | |
FINISH | | | |
-----------------> | | |
|FS handle| | |
----------> | |
| | Hotswap context | |
| ------------------> |
| | | Update Context |
| | |----------------->
| | | | Update
| | | | Context
| | Release old <------------------
| | page tables | |
| <------------------ |
| | | |
Tested:
This series was tested using QEMU with virtual IOMMU (VT-d) support. The
workflow was validated using a guest with virtio-net device bound to the
vfio-pci driver.
The new iommufd_liveupdate selftest was used to verify the end-to-end
preservation logic:
1. The selftest is run for the first time. It opens the VFIO device,
attaches it to an iommufd instance, and then uses the
LIVEUPDATE_IOCTL_FD_PRESERVE ioctl to mark the iommufd file descriptor
for preservation.
2. The test then triggers the LIVEUPDATE_PREPARE event, which in turn
triggers the preservation of the iommufd instance and the IOMMU
state.
3. The guest is rebooted using kexec.
4. After reboot, the selftest is run a second time. It detects the
LIVEUPDATE_STATE_UPDATED state and restores the iommud file
descriptor via the LIVEUPDATE_IOCTL_FD_RESTORE ioctl.
Future Work:
This RFC is the foundation for a more complete solution. The planned
next steps are:
- Implement the chosen page table preservation and restoration strategy
(Hotswap or Reuse).
- Keep the IOMMU translation enabled during shutdown.
- Add support for preserving PASID tables for devices that use them.
- Implement a versioning scheme for serialized data to ensure
compatibility across kernel versions.
- Extend support to other IOMMU architectures (e.g., AMD-Vi, Arm SMMUv3).
I am looking forward to feedback on this initial approach and the target
architecture.
Samiullah Khawaja (12):
iommu/vt-d: Register with Live Update Orchestrator
iommu: Add rw_semaphore to serialize live update state
iommu/vt-d: Prevent hotplugs when live update state is not normal
iommu: Add preserve iommu_domain op
iommu: Introduce API to preserve iommu domain
iommu/vt-d: Add stub intel iommu domain preserve op
iommu/vt-d: Add implementation of live update prepare callback
iommu/vt-d: Implement live update preserve_iommu_context
iommu/vt-d: Add live update freeze callback
iommu/vt-d: Restore iommu root_table and context on live update
iommu/vt-d: sanitize restored root table and iommu contexts
iommufd/selftest: Add test to verify iommufd preservation
YiFei Zhu (3):
iommufd: Add basic skeleton based on liveupdate_file_handle
iommufd-luo: Implement basic prepare/cancel/finish/retrieve using
folios
iommufd: Persist iommu domains for live update
MAINTAINERS | 2 +
drivers/iommu/intel/Makefile | 1 +
drivers/iommu/intel/dmar.c | 9 +
drivers/iommu/intel/iommu.c | 15 +-
drivers/iommu/intel/iommu.h | 9 +
drivers/iommu/intel/liveupdate.c | 401 ++++++++++++++++++
drivers/iommu/iommu.c | 24 ++
drivers/iommu/iommufd/Makefile | 1 +
drivers/iommu/iommufd/iommufd_private.h | 27 ++
drivers/iommu/iommufd/liveupdate.c | 236 +++++++++++
drivers/iommu/iommufd/main.c | 16 +-
include/linux/iommu.h | 22 +
tools/testing/selftests/iommu/Makefile | 1 +
.../selftests/iommu/iommufd_liveupdate.c | 196 +++++++++
14 files changed, 956 insertions(+), 4 deletions(-)
create mode 100644 drivers/iommu/intel/liveupdate.c
create mode 100644 drivers/iommu/iommufd/liveupdate.c
create mode 100644 tools/testing/selftests/iommu/iommufd_liveupdate.c
base-commit: 454219033bd8093293af8fbd4de47142530bdedc
--
2.51.0.536.g15c5d4f767-goog
Powered by blists - more mailing lists