[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20260127191643.GQ1134360@nvidia.com>
Date: Tue, 27 Jan 2026 15:16:43 -0400
From: Jason Gunthorpe <jgg@...dia.com>
To: Shivaprasad G Bhat <sbhat@...ux.ibm.com>
Cc: linux-kernel@...r.kernel.org, linuxppc-dev@...ts.ozlabs.org,
kvm@...r.kernel.org, iommu@...ts.linux.dev, chleroy@...nel.org,
mpe@...erman.id.au, maddy@...ux.ibm.com, npiggin@...il.com,
alex@...zbot.org, joerg.roedel@....com, kevin.tian@...el.com,
gbatra@...ux.ibm.com, clg@...d.org, vaibhav@...ux.ibm.com,
brking@...ux.vnet.ibm.com, nnmlinux@...ux.ibm.com,
amachhiw@...ux.ibm.com, tpearson@...torengineering.com
Subject: Re: [RFC PATCH] powerpc: iommu: Initial IOMMUFD support for PPC64
On Tue, Jan 27, 2026 at 06:35:56PM +0000, Shivaprasad G Bhat wrote:
> The RFC attempts to implement the IOMMUFD support on PPC64 by
> adding new iommu_ops for paging domain. The existing platform
> domain continues to be the default domain for in-kernel use.
It would be nice to see the platform domain go away and ppc use the
normal dma-iommu.c stuff, but I don't think it is critical to making
it work with iommufd.
> On PPC64, IOVA ranges are based on the type of the DMA window
> and their properties. Currently, there is no way to expose the
> attributes of the non-default 64-bit DMA window, which the platform
> supports. The platform allows the operating system to select the
> starting offset(at 4GiB or 512PiB default offset), pagesize and
> window size for the non-default 64-bit DMA window. For example,
> with VFIO, this is handled via VFIO_IOMMU_SPAPR_TCE_GET_INFO
> and VFIO_IOMMU_SPAPR_TCE_CREATE|REMOVE ioctls. While I am exploring
> the ways to expose and configure these DMA window attributes as
> per user input, any suggestions in this regard will be very helpful.
You can pass in driver specific information during HWPT creation, so
any properties you need can be specified there.
Then you'd want to introduce a new domain op to get the apertures
instead of the single range hard coded into the domain struct. The new
op would be able to return a list. We can use this op to return
apertures for sign extension page tables too.
Update iommufd to calculate the reserved regions by evaluating the
whole list.
I think you'll find this pretty straight forward, I'd do it as a
followup patch to this one.
> Currently existing vfio type1 specific vfio-compat driver even
> with this patch will not work for PPC64. I believe we need to have
> a separate "vfio-spapr-compat" driver to make it work.
Yes, vfio-compat doesn't support the special spapr ioctls.
I don't think you need a new driver, just implement whatever they do
with the existing interfaces, probably in its own .c file though.
However, I have no idea what is required to implement those ops, or if
it is even possible.. It may be easier to just leave the old vfio
stuff around instead of trying to compat it. The purpose of compat was
to be able to build kernels without type1 at all. It isn't necessary
to start using iommufd in new apps with the new interfaces.
Given you are mainly looking at a VMM that already will have iommufd
support it may not be worthwhile.
> @@ -1201,7 +1201,15 @@ spapr_tce_blocked_iommu_attach_dev(struct iommu_domain *platform_domain,
> * also sets the dma_api ops
> */
> table_group = iommu_group_get_iommudata(grp);
> +
> + if (old && old->type == IOMMU_DOMAIN_DMA) {
I'm trying to delete IOMMU_DOMAIN_DMA please don't use it in
drivers.
> static const struct iommu_ops spapr_tce_iommu_ops = {
> .default_domain = &spapr_tce_platform_domain,
> .blocked_domain = &spapr_tce_blocked_domain,
> @@ -1267,6 +1436,14 @@ static const struct iommu_ops spapr_tce_iommu_ops = {
> .probe_device = spapr_tce_iommu_probe_device,
> .release_device = spapr_tce_iommu_release_device,
> .device_group = spapr_tce_iommu_device_group,
> + .domain_alloc_paging = spapr_tce_domain_alloc_paging,
> + .default_domain_ops = &(const struct iommu_domain_ops) {
> + .attach_dev = spapr_tce_iommu_attach_device,
> + .map_pages = spapr_tce_iommu_map_pages,
> + .unmap_pages = spapr_tce_iommu_unmap_pages,
> + .iova_to_phys = spapr_tce_iommu_iova_to_phys,
> + .free = spapr_tce_domain_free,
> + }
Please don't use default_domain_ops in a driver that is supporting
multiple domain types and platform, it becomes confusing to guess
which domain type those ops are linked to.
You should also implement the BLOCKING domain type to make VFIO work
better
I wouldn't try to guess if this is right or not, but it looks pretty
reasonable as a first start.
Jason
Powered by blists - more mailing lists