lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250928190624.3735830-1-skhawaja@google.com>
Date: Sun, 28 Sep 2025 19:06:08 +0000
From: Samiullah Khawaja <skhawaja@...gle.com>
To: David Woodhouse <dwmw2@...radead.org>, Lu Baolu <baolu.lu@...ux.intel.com>, 
	Joerg Roedel <joro@...tes.org>, Will Deacon <will@...nel.org>, 
	Pasha Tatashin <pasha.tatashin@...een.com>, Jason Gunthorpe <jgg@...pe.ca>, iommu@...ts.linux.dev
Cc: Samiullah Khawaja <skhawaja@...gle.com>, Robin Murphy <robin.murphy@....com>, 
	Pratyush Yadav <pratyush@...nel.org>, Kevin Tian <kevin.tian@...el.com>, linux-kernel@...r.kernel.org, 
	Saeed Mahameed <saeedm@...dia.com>, Adithya Jayachandran <ajayachandra@...dia.com>, 
	Parav Pandit <parav@...dia.com>, Leon Romanovsky <leonro@...dia.com>, William Tu <witu@...dia.com>, 
	Vipin Sharma <vipinsh@...gle.com>, dmatlack@...gle.com, zhuyifei@...gle.com, 
	Chris Li <chrisl@...nel.org>, praan@...gle.com
Subject: [RFC PATCH 00/15] iommu: Add live update state preservation

Hi,

This RFC patch series introduces a mechanism for IOMMU state
preservation across live update, using the Intel VT-d driver as the
initial example implementation and demonstration platform.

Please take a look at the following LWN article to learn about KHO and
Live Update Orchestrator:

https://lwn.net/Articles/1033364/

This work is based on the LUOv3 patch series listed below. Please find
the details of various live update states, file descriptor and subsystem
preservation callbacks, and memory preservation mechanisms in the LUOv3
series.

https://lore.kernel.org/all/20250807014442.3829950-1-pasha.tatashin@soleen.com/

The kernel tree with all dependencies is uploaded to the following
Github location:

https://github.com/googleprodkernel/linux-liveupdate/tree/iommu/rfc-v1

Overall Goals:

The goal of this effort is to preserve the IOMMU domains, of devices
marked for preservation, managed by iommufd. This allows DMA mappings
and IOMMU context of a device assigned to a VM to be maintained across
a live update.

This will be ultimately achieved by preserving IOMMU page tables, IOMMU
root table and the relevant context entries across live update.

Current Implementation, Scope and Limitations:

This RFC provides foundational mechanisms and demonstrates the
end-to-end workflow. It only implements the preservation of the minimum
IOMMU state, which includes the root table and context tables.

Specifically, it includes:

 - Registration of the Intel VT-d IOMMU driver with the Live Update
   Orchestrator.
 - Registration of iommufd as a file handler with Live Update
   Orchestrator.
 - A subsystem-wide rw_semaphore to protect live update state and
   operations.
 - An API iommu_domain_preserve to preserve IOMMU domains for
   preservation. Currently it only marks them as preserved.
 - Implementation for preserving and restoring the Intel IOMMU root and
   context tables.
 - A selftest to validate the end-to-end preservation and restoration of
   an iommufd file descriptor.

This version does not yet preserve the DMA mappings (page tables)
themselves. This means that ongoing DMA from a device will not continue
to work across the live update. This is a known limitation that will be
addressed in future work.

It is important to note that the preservation of the device state itself
is outside the scope of this series.

The series also does not yet include a versioning scheme for the
persisted state; this will be added later.

Target Architectural Overview:

The target architecture for IOMMU state preservation across a live
update involves coordination between the Live Update Orchestrator,
iommufd, and the IOMMU drivers.

The core design uses the Live Update Orchestrator's file descriptor
preservation mechanism to preserve iommufd file descriptors. During
preservation, the LUO prepare callback for an iommufd walks through the
IOMMU domains it manages to identify the ones associated with devices
marked for preservation. Once identified, Generic Page Table support
will be used to preserve the page tables of these domains. The domains
are then marked as preserved.

The Live Update Orchestrator's subsystem mechanism will be used to
preserve the IOMMU context entries and the associated root table.

It is important to note that the preservation of the device state is
outside the scope of this patch series. This series focuses solely on
the IOMMU subsystem's role in supporting live update for such preserved
devices.

Critical Design Considerations:

After a live update, we can restore the IOMMU domain using two
approaches,

1. Reuse the preserved page tables:

During boot the next kernel can prepare the new domain reusing the
existing preserved page tables and reattach the devices to it. The
restored domain can be retrieved and reclaimed when the iommufd file
descriptor is restored.

2. Hotswap a new domain on finish:

During boot the next kernel can setup domains for all the preserved
devices without updating context entries, so these devices can keep on
using the old preserved page tables. The userspace VMM can restore the
iommufd, create IOAS/HWPT, attach devices to it and setup DMA mappings.
Once Live Update Orchestrator moves to the finish state, the context
entries of the preserved devices can be updated and replaced with the
new IOMMU domains and page tables that are cooked in the new kernel.

I am inclined towards the "Hotswap" approach, as it involves restoring
the minimum state from the previous kernel and lets user space
regenerate the mappings. This provides a clean way of discarding the old
kernel state and using the new kernel data structures. I will share more
details on the specifics of this approach in future versions of this
series.

High-Level Sequence Flow:

The following diagrams illustrate the high-level interactions during the
preservation phase. The diagrams also contain parts that are not
implemented in this series.

Prepare:

Before live update the PREPARE event of Liveupdate Orchestrator invokes
callbacks of the registered file and subsystem handlers.

 Userspace (VMM) |   LUO   |     iommufd     |   IOMMU Core    | Driver
-----------------|---------|-----------------|-----------------|--------
                 |         |                 |                 |
Preserve iommufd |         |                 |                 |
----------------->         |                 |                 |
                 | register|                 |                 |
<-----------------         |                 |                 |
                 |         |                 |                 |
                 |         |                 |                 |
  PREPARE        |         |                 |                 |
----------------->         |                 |                 |
                 |         |                 |                 |
                 | Call FS |                 |                 |
                 | handle  |                 |                 |
                 |--------->                 |                 |
                 |         | Preserve Domain |                 |
                 |         |----------------->                 |
                 |         |                 | Preserve using  |
		 |         |                 | Generic-Page    |
                 |         |                 |    Tables       |
                 |         |                 |----------------->
                 |         |                 |                 | Preserve
		 |         |                 |                 | Domain
		 |         |                 <------------------
		 |         <------------------                 |
                 |         | Return phys     |                 |
		 | save    | Address of      |                 |
		 <---------- state           |                 |
                 |         |                 |                 |
                 |         |                 |                 |
                 | subsys  |                 |                 |
                 | handle  |                 |                 |
                 |--------------------------------------------->
                 |         |                 |                 | Save iommu
		 |         |                 |                 | state
		 |         |                 |                 |
		 |         |                 |                 | Return phys
		 |         |                 |                 | Address of
		 |         |                 |                 | state
                 |         <------------------------------------
		 | save    |                 |                 |

Restore:

After a live update, the preserved state is restored during boot and/or
when userspace retrieves the preserved FDs.

 Userspace (VMM) |   LUO   |     iommufd     |   IOMMU Core    | Driver
-----------------|---------|-----------------|-----------------|--------
                 |         |                 |                 | Init
                 |         |                 |                 |
                 |         |                 |                 | get phys
                 |         |                 |                 | address
                 |         <------------------------------------
                 | Return  |                 |                 |
                 | addr    |                 |                 |
                 |         ------------------------------------>
                 |         |                 |                 | Restore root
                 |         |                 |                 | table
                 |         |                 |                 |
Retrieve iommufd |         |                 |                 |
-----------------> Call FS |                 |                 |
                 | handle  |                 |                 |
                 |--------->                 |                 |
                 |         | Restore         |                 |
                 <----------                 |                 |
                 |         |                 |                 |
Attach IOAS      |         |                 |                 |
--------------------------->                 |                 |
                 |         | Attach          |                 |
                 |         ------------------>                 |
                 |         |                 | attach          |
                 |         |                 ------------------> Attach domain
		 |         |                 |                 | w/o context
		 |         |                 |                 | update
                 |         |                 <------------------
                 <----------------------------                 |
                 |         |                 |                 |
                 |         |                 |                 |
FINISH           |         |                 |                 |
----------------->         |                 |                 |
                 |FS handle|                 |                 |
		 ---------->                 |                 |
                 |         | Hotswap context |                 |
                 |         ------------------>                 |
		 |         |                 | Update Context  |
                 |         |                 |----------------->
                 |         |                 |                 | Update
		 |         |                 |                 | Context
		 |         | Release old     <------------------
                 |         | page tables     |                 |
		 |         <------------------                 |
                 |         |                 |                 |

Tested:

This series was tested using QEMU with virtual IOMMU (VT-d) support. The
workflow was validated using a guest with virtio-net device bound to the
vfio-pci driver.

The new iommufd_liveupdate selftest was used to verify the end-to-end
preservation logic:

1. The selftest is run for the first time. It opens the VFIO device,
   attaches it to an iommufd instance, and then uses the
   LIVEUPDATE_IOCTL_FD_PRESERVE ioctl to mark the iommufd file descriptor
   for preservation.

2. The test then triggers the LIVEUPDATE_PREPARE event, which in turn
   triggers the preservation of the iommufd instance and the IOMMU
   state.

3. The guest is rebooted using kexec.

4. After reboot, the selftest is run a second time. It detects the
   LIVEUPDATE_STATE_UPDATED state and restores the iommud file
   descriptor via the LIVEUPDATE_IOCTL_FD_RESTORE ioctl.

Future Work:

This RFC is the foundation for a more complete solution. The planned
next steps are:

- Implement the chosen page table preservation and restoration strategy
  (Hotswap or Reuse).
- Keep the IOMMU translation enabled during shutdown.
- Add support for preserving PASID tables for devices that use them.
- Implement a versioning scheme for serialized data to ensure
  compatibility across kernel versions.
- Extend support to other IOMMU architectures (e.g., AMD-Vi, Arm SMMUv3).

I am looking forward to feedback on this initial approach and the target
architecture.

Samiullah Khawaja (12):
  iommu/vt-d: Register with Live Update Orchestrator
  iommu: Add rw_semaphore to serialize live update state
  iommu/vt-d: Prevent hotplugs when live update state is not normal
  iommu: Add preserve iommu_domain op
  iommu: Introduce API to preserve iommu domain
  iommu/vt-d: Add stub intel iommu domain preserve op
  iommu/vt-d: Add implementation of live update prepare callback
  iommu/vt-d: Implement live update preserve_iommu_context
  iommu/vt-d: Add live update freeze callback
  iommu/vt-d: Restore iommu root_table and context on live update
  iommu/vt-d: sanitize restored root table and iommu contexts
  iommufd/selftest: Add test to verify iommufd preservation

YiFei Zhu (3):
  iommufd: Add basic skeleton based on liveupdate_file_handle
  iommufd-luo: Implement basic prepare/cancel/finish/retrieve using
    folios
  iommufd: Persist iommu domains for live update

 MAINTAINERS                                   |   2 +
 drivers/iommu/intel/Makefile                  |   1 +
 drivers/iommu/intel/dmar.c                    |   9 +
 drivers/iommu/intel/iommu.c                   |  15 +-
 drivers/iommu/intel/iommu.h                   |   9 +
 drivers/iommu/intel/liveupdate.c              | 401 ++++++++++++++++++
 drivers/iommu/iommu.c                         |  24 ++
 drivers/iommu/iommufd/Makefile                |   1 +
 drivers/iommu/iommufd/iommufd_private.h       |  27 ++
 drivers/iommu/iommufd/liveupdate.c            | 236 +++++++++++
 drivers/iommu/iommufd/main.c                  |  16 +-
 include/linux/iommu.h                         |  22 +
 tools/testing/selftests/iommu/Makefile        |   1 +
 .../selftests/iommu/iommufd_liveupdate.c      | 196 +++++++++
 14 files changed, 956 insertions(+), 4 deletions(-)
 create mode 100644 drivers/iommu/intel/liveupdate.c
 create mode 100644 drivers/iommu/iommufd/liveupdate.c
 create mode 100644 tools/testing/selftests/iommu/iommufd_liveupdate.c


base-commit: 454219033bd8093293af8fbd4de47142530bdedc
-- 
2.51.0.536.g15c5d4f767-goog


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ