[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20210921180203.GY327412@nvidia.com>
Date: Tue, 21 Sep 2021 15:02:03 -0300
From: Jason Gunthorpe <jgg@...dia.com>
To: Liu Yi L <yi.l.liu@...el.com>
Cc: alex.williamson@...hat.com, hch@....de, jasowang@...hat.com,
joro@...tes.org, jean-philippe@...aro.org, kevin.tian@...el.com,
parav@...lanox.com, lkml@...ux.net, pbonzini@...hat.com,
lushenming@...wei.com, eric.auger@...hat.com, corbet@....net,
ashok.raj@...el.com, yi.l.liu@...ux.intel.com,
jun.j.tian@...el.com, hao.wu@...el.com, dave.jiang@...el.com,
jacob.jun.pan@...ux.intel.com, kwankhede@...dia.com,
robin.murphy@....com, kvm@...r.kernel.org,
iommu@...ts.linux-foundation.org, dwmw2@...radead.org,
linux-kernel@...r.kernel.org, baolu.lu@...ux.intel.com,
david@...son.dropbear.id.au, nicolinc@...dia.com
Subject: Re: [RFC 14/20] iommu/iommufd: Add iommufd_device_[de]attach_ioasid()
On Sun, Sep 19, 2021 at 02:38:42PM +0800, Liu Yi L wrote:
> An I/O address space takes effect in the iommu only after it's attached
> by a device. This patch provides iommufd_device_[de/at]tach_ioasid()
> helpers for this purpose. One device can be only attached to one ioasid
> at this point, but one ioasid can be attached by multiple devices.
>
> The caller specifies the iommufd_device (returned at binding time) and
> the target ioasid when calling the helper function. Upon request, iommufd
> installs the specified I/O page table to the correct place in the IOMMU,
> according to the routing information (struct device* which represents
> RID) recorded in iommufd_device. Future variants could allow the caller
> to specify additional routing information (e.g. pasid/ssid) when multiple
> I/O address spaces are supported per device.
>
> Open:
> Per Jason's comment in below link, bus-specific wrappers are recommended.
> This RFC implements one wrapper for pci device. But it looks that struct
> pci_device is not used at all since iommufd_ device already carries all
> necessary info. So want to have another discussion on its necessity, e.g.
> whether making more sense to have bus-specific wrappers for binding, while
> leaving a common attaching helper per iommufd_device.
> https://lore.kernel.org/linux-iommu/20210528233649.GB3816344@nvidia.com/
>
> TODO:
> When multiple devices are attached to a same ioasid, the permitted iova
> ranges and supported pgsize bitmap on this ioasid should be a common
> subset of all attached devices. iommufd needs to track such info per
> ioasid and update it every time when a new device is attached to the
> ioasid. This has not been done in this version yet, due to the temporary
> hack adopted in patch 16-18. The hack reuses vfio type1 driver which
> already includes the necessary logic for iova ranges and pgsize bitmap.
> Once we get a clear direction for those patches, that logic will be moved
> to this patch.
>
> Signed-off-by: Liu Yi L <yi.l.liu@...el.com>
> drivers/iommu/iommufd/iommufd.c | 226 ++++++++++++++++++++++++++++++++
> include/linux/iommufd.h | 29 ++++
> 2 files changed, 255 insertions(+)
>
> diff --git a/drivers/iommu/iommufd/iommufd.c b/drivers/iommu/iommufd/iommufd.c
> index e45d76359e34..25373a0e037a 100644
> +++ b/drivers/iommu/iommufd/iommufd.c
> @@ -51,6 +51,19 @@ struct iommufd_ioas {
> bool enforce_snoop;
> struct iommufd_ctx *ictx;
> refcount_t refs;
> + struct mutex lock;
> + struct list_head device_list;
> + struct iommu_domain *domain;
This should just be another xarray indexed by the device id
> +/* Caller should hold ioas->lock */
> +static struct ioas_device_info *ioas_find_device(struct iommufd_ioas *ioas,
> + struct iommufd_device *idev)
> +{
> + struct ioas_device_info *ioas_dev;
> +
> + list_for_each_entry(ioas_dev, &ioas->device_list, next) {
> + if (ioas_dev->idev == idev)
> + return ioas_dev;
> + }
Which eliminates this search. xarray with tightly packed indexes is
generally more efficient than linked lists..
> +static int ioas_check_device_compatibility(struct iommufd_ioas *ioas,
> + struct device *dev)
> +{
> + bool snoop = false;
> + u32 addr_width;
> + int ret;
> +
> + /*
> + * currently we only support I/O page table with iommu enforce-snoop
> + * format. Attaching a device which doesn't support this format in its
> + * upstreaming iommu is rejected.
> + */
> + ret = iommu_device_get_info(dev, IOMMU_DEV_INFO_FORCE_SNOOP, &snoop);
> + if (ret || !snoop)
> + return -EINVAL;
> +
> + ret = iommu_device_get_info(dev, IOMMU_DEV_INFO_ADDR_WIDTH, &addr_width);
> + if (ret || addr_width < ioas->addr_width)
> + return -EINVAL;
> +
> + /* TODO: also need to check permitted iova ranges and pgsize bitmap */
> +
> + return 0;
> +}
This seems kind of weird..
I expect the iommufd to hold a SW copy of the IO page table and each
time a new domain is to be created it should push the SW copy into the
domain. If the domain cannot support it then the domain driver should
naturally fail a request.
When the user changes the IO page table the SW copy is updated then
all of the domains are updated too - again if any domain cannot
support the change then it fails and the change is rolled back.
It seems like this is a side effect of roughly hacking in the vfio
code?
> +
> +/**
> + * iommufd_device_attach_ioasid - attach device to an ioasid
> + * @idev: [in] Pointer to struct iommufd_device.
> + * @ioasid: [in] ioasid points to an I/O address space.
> + *
> + * Returns 0 for successful attach, otherwise returns error.
> + *
> + */
> +int iommufd_device_attach_ioasid(struct iommufd_device *idev, int ioasid)
Types for the ioas_id again..
> +{
> + struct iommufd_ioas *ioas;
> + struct ioas_device_info *ioas_dev;
> + struct iommu_domain *domain;
> + int ret;
> +
> + ioas = ioasid_get_ioas(idev->ictx, ioasid);
> + if (!ioas) {
> + pr_err_ratelimited("Trying to attach illegal or unkonwn IOASID %u\n", ioasid);
> + return -EINVAL;
No prints triggered by bad userspace
> + }
> +
> + mutex_lock(&ioas->lock);
> +
> + /* Check for duplicates */
> + if (ioas_find_device(ioas, idev)) {
> + ret = -EINVAL;
> + goto out_unlock;
> + }
just xa_cmpxchg NULL, XA_ZERO_ENTRY
> + /*
> + * Each ioas is backed by an iommu domain, which is allocated
> + * when the ioas is attached for the first time and then shared
> + * by following devices.
> + */
> + if (list_empty(&ioas->device_list)) {
Seems strange, what if the devices are forced to have different
domains? We don't want to model that in the SW layer..
> + /* Install the I/O page table to the iommu for this device */
> + ret = iommu_attach_device(domain, idev->dev);
> + if (ret)
> + goto out_domain;
This is where things start to get confusing when you talk about PASID
as the above call needs to be some PASID centric API.
> @@ -27,6 +28,16 @@ struct iommufd_device *
> iommufd_bind_device(int fd, struct device *dev, u64 dev_cookie);
> void iommufd_unbind_device(struct iommufd_device *idev);
>
> +int iommufd_device_attach_ioasid(struct iommufd_device *idev, int ioasid);
> +void iommufd_device_detach_ioasid(struct iommufd_device *idev, int ioasid);
> +
> +static inline int
> +__pci_iommufd_device_attach_ioasid(struct pci_dev *pdev,
> + struct iommufd_device *idev, int ioasid)
> +{
> + return iommufd_device_attach_ioasid(idev, ioasid);
> +}
If think sis taking in the iommfd_device then there isn't a logical
place to signal the PCIness
But, I think the API should at least have a kdoc that this is
capturing the entire device and specify that for PCI this means all
TLPs with the RID.
Jason
Powered by blists - more mailing lists