[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <cover.1746139811.git.nicolinc@nvidia.com>
Date: Thu, 1 May 2025 16:01:06 -0700
From: Nicolin Chen <nicolinc@...dia.com>
To: <jgg@...dia.com>, <kevin.tian@...el.com>, <corbet@....net>,
<will@...nel.org>
CC: <bagasdotme@...il.com>, <robin.murphy@....com>, <joro@...tes.org>,
<thierry.reding@...il.com>, <vdumpa@...dia.com>, <jonathanh@...dia.com>,
<shuah@...nel.org>, <jsnitsel@...hat.com>, <nathan@...nel.org>,
<peterz@...radead.org>, <yi.l.liu@...el.com>, <mshavit@...gle.com>,
<praan@...gle.com>, <zhangzekun11@...wei.com>, <iommu@...ts.linux.dev>,
<linux-doc@...r.kernel.org>, <linux-kernel@...r.kernel.org>,
<linux-arm-kernel@...ts.infradead.org>, <linux-tegra@...r.kernel.org>,
<linux-kselftest@...r.kernel.org>, <patches@...ts.linux.dev>,
<mochs@...dia.com>, <alok.a.tiwari@...cle.com>, <vasant.hegde@....com>
Subject: [PATCH v3 00/23] iommufd: Add vIOMMU infrastructure (Part-4 vQUEUE)
The vIOMMU object is designed to represent a slice of an IOMMU HW for its
virtualization features shared with or passed to user space (a VM mostly)
in a way of HW acceleration. This extended the HWPT-based design for more
advanced virtualization feature.
A vQUEUE introduced by this series as a part of the vIOMMU infrastructure
represents a HW accelerated queue/buffer for VM to use exclusively, e.g.
- NVIDIA's Virtual Command Queue
- AMD vIOMMU's Command Buffer, Event Log Buffer, and PPR Log Buffer
each of which is an IOMMU HW feature to directly access the virtual queue
in the guest address space, to avoid VM Exits to improve the performance.
As an initial use case, it adds support for guest-owned HW virtual queues
that VMM can allocate per request from a guest OS writing the VM register.
Introduce IOMMUFD_OBJ_VQUEUE and its allocator IOMMUFD_CMD_VQUEUE_ALLOC,
allowing VMM to forward the IOMMU-specific queue info, such as queue base
address, size, and etc.
Meanwhile, a guest-owned virtual queue needs the kernel (a virtual queue
driver) to control the queue by reading/writing its consumer and producer
indexes, which means the virtual queue HW allows the guest kernel to get
a direct R/W access to those registers. Introduce an mmap infrastructure
to the iommufd core so as to support pass through a piece of MMIO region
from the host physical address space to the guest physical address space.
The VMA info (vm_pgoff/size) used by an mmap must be pre-allocated during
the IOMMUFD_CMD_VQUEUE_ALLOC and returned to the user space as an output
driver-data carried via the IOMMUFD_CMD_VQUEUE_ALLOC. So, this requires a
driver-specific user data support in the vIOMMU allocation flow.
As a real-world use case, this series implements a vQUEUE support to the
tegra241-cmdqv driver for VCMDQs on NVIDIA Grace CPU. In another word, it
is also the Tegra CMDQV series Part-2 (user-space support), reworked from
Previous RFCv1:
https://lore.kernel.org/all/cover.1712978212.git.nicolinc@nvidia.com/
This enables the HW accelerated feature for NVIDIA Grace CPU. Compared to
the standard SMMUv3 operating in the nested translation mode trapping CMDQ
for TLBI and ATC_INV commands, this gives a huge performance improvement:
70% to 90% reductions of invalidation time were measured by various DMA
unmap tests running in a guest OS.
// Unmap latencies from "dma_map_benchmark -g @granule -t @threads",
// by toggling "/sys/kernel/debug/iommu/tegra241_cmdqv/bypass_vcmdq"
@granule | @threads | bypass_vcmdq=1 | bypass_vcmdq=0
4KB 1 35.7 us 5.3 us
16KB 1 41.8 us 6.8 us
64KB 1 68.9 us 9.9 us
128KB 1 109.0 us 12.6 us
256KB 1 187.1 us 18.0 us
4KB 2 96.9 us 6.8 us
16KB 2 97.8 us 7.5 us
64KB 2 151.5 us 10.7 us
128KB 2 257.8 us 12.7 us
256KB 2 443.0 us 17.9 us
This is on Github:
https://github.com/nicolinc/iommufd/commits/iommufd_vqueue-v3
Paring QEMU branch for testing:
https://github.com/nicolinc/qemu/commits/wip/for_iommufd_vqueue-v3
Changelog
v3
* Add Reviewed-by from Baolu, Pranjal, and Alok
* Revise kdocs, uAPI docs, and commit logs
* Rename "vCMDQ" back to "vQUEUE" for AMD cases
* [tegra] Add tegra241_vcmdq_hw_flush_timeout()
* [tegra] Rename vsmmu_alloc to alloc_vintf_user
* [tegra] Use writel for SID replacement registers
* [tegra] Move mmap removal call to vsmmu_destroy op
* [tegra] Fix revert in tegra241_vintf_alloc_lvcmdq_user()
* [iommufd] Replace "& ~PAGE_MASK" with PAGE_ALIGNED()
* [iommufd] Add an object-type "owner" to immap structure
* [iommufd] Drop the ictx input in the new for-driver APIs
* [iommufd] Add iommufd_vma_ops to keep track of mmap lifecycle
* [iommufd] Add viommu-based iommufd_viommu_alloc/destroy_mmap helpers
* [iommufd] Rename iommufd_ctx_alloc/free_mmap to
_iommufd_alloc/destroy_mmap
v2
https://lore.kernel.org/all/cover.1745646960.git.nicolinc@nvidia.com/
* Add Reviewed-by from Jason
* [smmu] Fix vsmmu initial value
* [smmu] Support impl for hw_info
* [tegra] Rename "slot" to "vsid"
* [tegra] Update kdocs and commit logs
* [tegra] Map/unmap LVCMDQ dynamically
* [tegra] Refcount the previous LVCMDQ
* [tegra] Return -EEXIST if LVCMDQ exists
* [tegra] Simplify VINTF cleanup routine
* [tegra] Use vmid and s2_domain in vsmmu
* [tegra] Rename "mmap_pgoff" to "immap_id"
* [tegra] Add more addr and length validation
* [iommufd] Add more narrative to mmap's kdoc
* [iommufd] Add iommufd_struct_depend/undepend()
* [iommufd] Rename vcmdq_free op to vcmdq_destroy
* [iommufd] Fix bug in iommu_copy_struct_to_user()
* [iommufd] Drop is_io from iommufd_ctx_alloc_mmap()
* [iommufd] Test the queue memory for its contiguity
* [iommufd] Return -ENXIO if address or length fails
* [iommufd] Do not change @min_last in mock_viommu_alloc()
* [iommufd] Generalize TEGRA241_VCMDQ data in core structure
* [iommufd] Add selftest coverage for IOMMUFD_CMD_VCMDQ_ALLOC
* [iommufd] Add iopt_pin_pages() to prevent queue memory from unmapping
v1
https://lore.kernel.org/all/cover.1744353300.git.nicolinc@nvidia.com/
Thanks
Nicolin
Nicolin Chen (23):
iommufd/viommu: Add driver-allocated vDEVICE support
iommu: Pass in a driver-level user data structure to viommu_alloc op
iommufd/viommu: Allow driver-specific user data for a vIOMMU object
iommu: Add iommu_copy_struct_to_user helper
iommufd/driver: Let iommufd_viommu_alloc helper save ictx to
viommu->ictx
iommufd/driver: Add iommufd_struct_destroy to revert
iommufd_viommu_alloc
iommufd/selftest: Support user_data in mock_viommu_alloc
iommufd/selftest: Add covearge for viommu data
iommufd: Abstract iopt_pin_pages and iopt_unpin_pages helpers
iommufd/viommu: Introduce IOMMUFD_OBJ_VQUEUE and its related struct
iommufd/viommu: Add IOMMUFD_CMD_VQUEUE_ALLOC ioctl
iommufd/driver: Add iommufd_vqueue_depend/undepend() helpers
iommufd/selftest: Add coverage for IOMMUFD_CMD_VQUEUE_ALLOC
iommufd: Add mmap interface
iommufd/selftest: Add coverage for the new mmap interface
Documentation: userspace-api: iommufd: Update vQUEUE
iommu/arm-smmu-v3-iommufd: Add vsmmu_alloc impl op
iommu/arm-smmu-v3-iommufd: Support implementation-defined hw_info
iommu/tegra241-cmdqv: Use request_threaded_irq
iommu/tegra241-cmdqv: Simplify deinit flow in
tegra241_cmdqv_remove_vintf()
iommu/tegra241-cmdqv: Do not statically map LVCMDQs
iommu/tegra241-cmdqv: Add user-space use support
iommu/tegra241-cmdqv: Add IOMMU_VEVENTQ_TYPE_TEGRA241_CMDQV support
drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h | 25 +-
drivers/iommu/iommufd/io_pagetable.h | 8 +
drivers/iommu/iommufd/iommufd_private.h | 28 +-
drivers/iommu/iommufd/iommufd_test.h | 20 +
include/linux/iommu.h | 43 +-
include/linux/iommufd.h | 184 ++++++-
include/uapi/linux/iommufd.h | 117 ++++-
tools/testing/selftests/iommu/iommufd_utils.h | 52 +-
.../arm/arm-smmu-v3/arm-smmu-v3-iommufd.c | 42 +-
.../iommu/arm/arm-smmu-v3/tegra241-cmdqv.c | 481 +++++++++++++++++-
drivers/iommu/iommufd/device.c | 117 +----
drivers/iommu/iommufd/driver.c | 88 ++++
drivers/iommu/iommufd/io_pagetable.c | 95 ++++
drivers/iommu/iommufd/main.c | 84 ++-
drivers/iommu/iommufd/selftest.c | 126 ++++-
drivers/iommu/iommufd/viommu.c | 116 ++++-
tools/testing/selftests/iommu/iommufd.c | 96 +++-
.../selftests/iommu/iommufd_fail_nth.c | 11 +-
Documentation/userspace-api/iommufd.rst | 15 +
19 files changed, 1555 insertions(+), 193 deletions(-)
--
2.43.0
Powered by blists - more mailing lists