[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <BN9PR11MB52769BCEED1DC36DBCA75AF98C4F2@BN9PR11MB5276.namprd11.prod.outlook.com>
Date: Fri, 25 Oct 2024 08:34:05 +0000
From: "Tian, Kevin" <kevin.tian@...el.com>
To: Nicolin Chen <nicolinc@...dia.com>, "jgg@...dia.com" <jgg@...dia.com>,
"will@...nel.org" <will@...nel.org>
CC: "joro@...tes.org" <joro@...tes.org>, "suravee.suthikulpanit@....com"
<suravee.suthikulpanit@....com>, "robin.murphy@....com"
<robin.murphy@....com>, "dwmw2@...radead.org" <dwmw2@...radead.org>,
"baolu.lu@...ux.intel.com" <baolu.lu@...ux.intel.com>, "shuah@...nel.org"
<shuah@...nel.org>, "linux-kernel@...r.kernel.org"
<linux-kernel@...r.kernel.org>, "iommu@...ts.linux.dev"
<iommu@...ts.linux.dev>, "linux-arm-kernel@...ts.infradead.org"
<linux-arm-kernel@...ts.infradead.org>, "linux-kselftest@...r.kernel.org"
<linux-kselftest@...r.kernel.org>, "eric.auger@...hat.com"
<eric.auger@...hat.com>, "jean-philippe@...aro.org"
<jean-philippe@...aro.org>, "mdf@...nel.org" <mdf@...nel.org>,
"mshavit@...gle.com" <mshavit@...gle.com>,
"shameerali.kolothum.thodi@...wei.com"
<shameerali.kolothum.thodi@...wei.com>, "smostafa@...gle.com"
<smostafa@...gle.com>, "Liu, Yi L" <yi.l.liu@...el.com>, "aik@....com"
<aik@....com>, "zhangfei.gao@...aro.org" <zhangfei.gao@...aro.org>,
"patches@...ts.linux.dev" <patches@...ts.linux.dev>
Subject: RE: [PATCH v4 00/11] iommufd: Add vIOMMU infrastructure (Part-1)
> From: Nicolin Chen <nicolinc@...dia.com>
> Sent: Tuesday, October 22, 2024 8:19 AM
>
> This series introduces a new vIOMMU infrastructure and related ioctls.
>
> IOMMUFD has been using the HWPT infrastructure for all cases, including a
> nested IO page table support. Yet, there're limitations for an HWPT-based
> structure to support some advanced HW-accelerated features, such as
> CMDQV
> on NVIDIA Grace, and HW-accelerated vIOMMU on AMD. Even for a multi-
> IOMMU
> environment, it is not straightforward for nested HWPTs to share the same
> parent HWPT (stage-2 IO pagetable), with the HWPT infrastructure alone: a
> parent HWPT typically hold one stage-2 IO pagetable and tag it with only
> one ID in the cache entries. When sharing one large stage-2 IO pagetable
> across physical IOMMU instances, that one ID may not always be available
> across all the IOMMU instances. In other word, it's ideal for SW to have
> a different container for the stage-2 IO pagetable so it can hold another
> ID that's available.
Just holding multiple IDs doesn't require a different container. This is
just a side effect when vIOMMU will be required for other said reasons.
If we have to put more words here I'd prefer to adding a bit more for
CMDQV which is more compelling. not a big deal though. 😊
>
> For this "different container", add vIOMMU, an additional layer to hold
> extra virtualization information:
>
> ________________________________________________________________
> _______
> | iommufd (with vIOMMU) |
> | |
> | [5] |
> | _____________ |
> | | | |
> | |----------------| vIOMMU | |
> | | | | |
> | | | | |
> | | [1] | | [4] [2] |
> | | ______ | | _____________ ________ |
> | | | | | [3] | | | | | |
> | | | IOAS |<---|(HWPT_PAGING)|<---| HWPT_NESTED |<--| DEVICE | |
> | | |______| |_____________| |_____________| |________| |
> | | | | | | |
>
> |______|________|______________|__________________|_____________
> __|_____|
> | | | | |
> ______v_____ | ______v_____ ______v_____ ___v__
> | struct | | PFN | (paging) | | (nested) | |struct|
> |iommu_device| |------>|iommu_domain|<----|iommu_domain|<----
> |device|
> |____________| storage|____________| |____________| |______|
>
nit - [1] ... [5] can be removed.
> The vIOMMU object should be seen as a slice of a physical IOMMU instance
> that is passed to or shared with a VM. That can be some HW/SW resources:
> - Security namespace for guest owned ID, e.g. guest-controlled cache tags
> - Access to a sharable nesting parent pagetable across physical IOMMUs
> - Virtualization of various platforms IDs, e.g. RIDs and others
> - Delivery of paravirtualized invalidation
> - Direct assigned invalidation queues
> - Direct assigned interrupts
> - Non-affiliated event reporting
sorry no idea about 'non-affiliated event'. Can you elaborate?
>
> On a multi-IOMMU system, the vIOMMU object must be instanced to the
> number
> of the physical IOMMUs that are passed to (via devices) a guest VM, while
'to the number of the physical IOMMUs that have a slice passed to ..."
> being able to hold the shareable parent HWPT. Each vIOMMU then just
> needs
> to allocate its own individual ID to tag its own cache:
> ----------------------------
> ---------------- | | paging_hwpt0 |
> | hwpt_nested0 |--->| viommu0 ------------------
> ---------------- | | IDx |
> ----------------------------
> ----------------------------
> ---------------- | | paging_hwpt0 |
> | hwpt_nested1 |--->| viommu1 ------------------
> ---------------- | | IDy |
> ----------------------------
>
> As an initial part-1, add IOMMUFD_CMD_VIOMMU_ALLOC ioctl for an
> allocation
> only. And implement it in arm-smmu-v3 driver as a real world use case.
>
> More vIOMMU-based structs and ioctls will be introduced in the follow-up
> series to support vDEVICE, vIRQ (vEVENT) and vQUEUE objects. Although we
> repurposed the vIOMMU object from an earlier RFC, just for a referece:
> https://lore.kernel.org/all/cover.1712978212.git.nicolinc@nvidia.com/
>
> This series is on Github:
> https://github.com/nicolinc/iommufd/commits/iommufd_viommu_p1-v4
> (paring QEMU branch for testing will be provided with the part2 series)
>
> Changelog
> v4
> * Added "Reviewed-by" from Jason
> * Dropped IOMMU_VIOMMU_TYPE_DEFAULT support
> * Dropped iommufd_object_alloc_elm renamings
> * Renamed iommufd's viommu_api.c to driver.c
> * Reworked iommufd_viommu_alloc helper
> * Added a separate iommufd_hwpt_nested_alloc_for_viommu function for
> hwpt_nested allocations on a vIOMMU, and added comparison between
> viommu->iommu_dev->ops and dev_iommu_ops(idev->dev)
> * Replaced s2_parent with vsmmu in arm_smmu_nested_domain
> * Replaced domain_alloc_user in iommu_ops with domain_alloc_nested in
> viommu_ops
> * Replaced wait_queue_head_t with a completion, to delay the unplug of
> mock_iommu_dev
> * Corrected documentation graph that was missing struct iommu_device
> * Added an iommufd_verify_unfinalized_object helper to verify driver-
> allocated vIOMMU/vDEVICE objects
> * Added missing test cases for TEST_LENGTH and fail_nth
> v3
> https://lore.kernel.org/all/cover.1728491453.git.nicolinc@nvidia.com/
> * Rebased on top of Jason's nesting v3 series
> https://lore.kernel.org/all/0-v3-e2e16cd7467f+2a6a1-
> smmuv3_nesting_jgg@...dia.com/
> * Split the series into smaller parts
> * Added Jason's Reviewed-by
> * Added back viommu->iommu_dev
> * Added support for driver-allocated vIOMMU v.s. core-allocated
> * Dropped arm_smmu_cache_invalidate_user
> * Added an iommufd_test_wait_for_users() in selftest
> * Reworked test code to make viommu an individual FIXTURE
> * Added missing TEST_LENGTH case for the new ioctl command
> v2
> https://lore.kernel.org/all/cover.1724776335.git.nicolinc@nvidia.com/
> * Limited vdev_id to one per idev
> * Added a rw_sem to protect the vdev_id list
> * Reworked driver-level APIs with proper lockings
> * Added a new viommu_api file for IOMMUFD_DRIVER config
> * Dropped useless iommu_dev point from the viommu structure
> * Added missing index numnbers to new types in the uAPI header
> * Dropped IOMMU_VIOMMU_INVALIDATE uAPI; Instead, reuse the HWPT
> one
> * Reworked mock_viommu_cache_invalidate() using the new iommu helper
> * Reordered details of set/unset_vdev_id handlers for proper lockings
> v1
> https://lore.kernel.org/all/cover.1723061377.git.nicolinc@nvidia.com/
>
> Thanks!
> Nicolin
>
> Nicolin Chen (11):
> iommufd: Move struct iommufd_object to public iommufd header
> iommufd: Introduce IOMMUFD_OBJ_VIOMMU and its related struct
> iommufd: Add iommufd_verify_unfinalized_object
> iommufd/viommu: Add IOMMU_VIOMMU_ALLOC ioctl
> iommufd: Add domain_alloc_nested op to iommufd_viommu_ops
> iommufd: Allow pt_id to carry viommu_id for IOMMU_HWPT_ALLOC
> iommufd/selftest: Add refcount to mock_iommu_device
> iommufd/selftest: Add IOMMU_VIOMMU_TYPE_SELFTEST
> iommufd/selftest: Add IOMMU_VIOMMU_ALLOC test coverage
> Documentation: userspace-api: iommufd: Update vIOMMU
> iommu/arm-smmu-v3: Add IOMMU_VIOMMU_TYPE_ARM_SMMUV3
> support
>
> drivers/iommu/iommufd/Makefile | 5 +-
> drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h | 26 +++---
> drivers/iommu/iommufd/iommufd_private.h | 36 ++------
> drivers/iommu/iommufd/iommufd_test.h | 2 +
> include/linux/iommu.h | 14 +++
> include/linux/iommufd.h | 89 +++++++++++++++++++
> include/uapi/linux/iommufd.h | 56 ++++++++++--
> tools/testing/selftests/iommu/iommufd_utils.h | 28 ++++++
> .../arm/arm-smmu-v3/arm-smmu-v3-iommufd.c | 79 ++++++++++------
> drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 9 +-
> drivers/iommu/iommufd/driver.c | 38 ++++++++
> drivers/iommu/iommufd/hw_pagetable.c | 69 +++++++++++++-
> drivers/iommu/iommufd/main.c | 58 ++++++------
> drivers/iommu/iommufd/selftest.c | 73 +++++++++++++--
> drivers/iommu/iommufd/viommu.c | 85 ++++++++++++++++++
> tools/testing/selftests/iommu/iommufd.c | 78 ++++++++++++++++
> .../selftests/iommu/iommufd_fail_nth.c | 11 +++
> Documentation/userspace-api/iommufd.rst | 69 +++++++++++++-
> 18 files changed, 701 insertions(+), 124 deletions(-)
> create mode 100644 drivers/iommu/iommufd/driver.c
> create mode 100644 drivers/iommu/iommufd/viommu.c
>
> --
> 2.43.0
>
Powered by blists - more mailing lists