[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ZyE3uGyVx9ivJeHI@Asurada-Nvidia>
Date: Tue, 29 Oct 2024 12:30:00 -0700
From: Nicolin Chen <nicolinc@...dia.com>
To: Jason Gunthorpe <jgg@...dia.com>
CC: <kevin.tian@...el.com>, <will@...nel.org>, <joro@...tes.org>,
<suravee.suthikulpanit@....com>, <robin.murphy@....com>,
<dwmw2@...radead.org>, <baolu.lu@...ux.intel.com>, <shuah@...nel.org>,
<linux-kernel@...r.kernel.org>, <iommu@...ts.linux.dev>,
<linux-arm-kernel@...ts.infradead.org>, <linux-kselftest@...r.kernel.org>,
<eric.auger@...hat.com>, <jean-philippe@...aro.org>, <mdf@...nel.org>,
<mshavit@...gle.com>, <shameerali.kolothum.thodi@...wei.com>,
<smostafa@...gle.com>, <yi.l.liu@...el.com>, <aik@....com>,
<zhangfei.gao@...aro.org>, <patches@...ts.linux.dev>
Subject: Re: [PATCH v5 01/13] iommufd/viommu: Add IOMMUFD_OBJ_VDEVICE and
IOMMU_VDEVICE_ALLOC ioctl
On Tue, Oct 29, 2024 at 03:48:01PM -0300, Jason Gunthorpe wrote:
> On Tue, Oct 29, 2024 at 10:29:56AM -0700, Nicolin Chen wrote:
> > On Tue, Oct 29, 2024 at 12:58:24PM -0300, Jason Gunthorpe wrote:
> > > On Fri, Oct 25, 2024 at 04:50:30PM -0700, Nicolin Chen wrote:
> > > > diff --git a/drivers/iommu/iommufd/device.c b/drivers/iommu/iommufd/device.c
> > > > index 5fd3dd420290..e50113305a9c 100644
> > > > --- a/drivers/iommu/iommufd/device.c
> > > > +++ b/drivers/iommu/iommufd/device.c
> > > > @@ -277,6 +277,17 @@ EXPORT_SYMBOL_NS_GPL(iommufd_ctx_has_group, IOMMUFD);
> > > > */
> > > > void iommufd_device_unbind(struct iommufd_device *idev)
> > > > {
> > > > + u32 vdev_id = 0;
> > > > +
> > > > + /* idev->vdev object should be destroyed prior, yet just in case.. */
> > > > + mutex_lock(&idev->igroup->lock);
> > > > + if (idev->vdev)
> > >
> > > Then should it have a WARN_ON here?
> >
> > It'd be a user space mistake that forgot to call the destroy ioctl
> > to the object, in which case I recall kernel shouldn't WARN_ON?
>
> But you can't get here because:
>
> refcount_inc(&idev->obj.users);
>
> And kernel doesn't destroy objects with elevated ref counts?
Hmm, this is not a ->destroy() but iommufd_device_unbind called
by VFIO. And we actually ran into this routine when QEMU didn't
destroy vdev. So, I added this chunk.
The iommufd_object_remove(vdev_id) here would destroy the vdev
where its destroy() does refcount_dec(&idev->obj.users). Then,
the following iommufd_object_destroy_user(.., &idev->obj) will
succeed.
With that said, let's just mandate userspace to destroy vdev.
> > > > + vdev_id = idev->vdev->obj.id;
> > > > + mutex_unlock(&idev->igroup->lock);
> > > > + /* Relying on xa_lock against a race with iommufd_destroy() */
> > > > + if (vdev_id)
> > > > + iommufd_object_remove(idev->ictx, NULL, vdev_id, 0);
> > >
> > > That doesn't seem right, iommufd_object_remove() should never be used
> > > to destroy an object that userspace created with an IOCTL, in fact
> > > that just isn't allowed.
> >
> > It was for our auto destroy feature.
>
> auto domains are "hidden" hwpts that are kernel managed. They are not
> "userspace created".
>
> "Usespace created" objects are ones that userspace is expected to call
> destroy on.
OK. I misunderstood that.
> If you destroy them behind the scenes in the kerenl then the objecd ID
> can be reallocated for something else and when userspace does DESTROY
> on the ID it thought was still allocated it will malfunction.
>
> So, only userspace can destroy objects that userspace created.
I see. That makes sense.
> > If user space forgot to destroy the object while trying to unplug
> > the device from VM. This saves the day.
>
> No, it should/does fail destroy of the VIOMMU object because the users
> refcount is elevated.
The vIOMMU object is refcount_dec also from the unbind() calling
remove(). But anyway, we aligned that userspace should destroy it
explicitly.
> > > Ugh, there is worse here, we can't hold a long term reference on a
> > > kernel owned object:
> > >
> > > idev->vdev = vdev;
> > > refcount_inc(&idev->obj.users);
> > >
> > > As it prevents the kernel from disconnecting it.
> >
> > Hmm, mind elaborating? I think the iommufd_fops_release() would
> > xa_for_each the object list that destroys the vdev object first
> > then this idev (and viommu too)?
>
> iommufd_device_unbind() can't fail, and if the object can't be
> destroyed because it has an elevated long term refcount it WARN's:
>
>
> ret = iommufd_object_remove(ictx, obj, obj->id, REMOVE_WAIT_SHORTTERM);
>
> /*
> * If there is a bug and we couldn't destroy the object then we did put
> * back the caller's users refcount and will eventually try to free it
> * again during close.
> */
> WARN_ON(ret);
>
> So you cannot take long term references on kernel owned objects. Only
> userspace owned objects.
OK. I think I had got this part. Gao ran into this WARN_ON at v3,
so I added iommufd_object_remove(vdev_id) in unbind() prior to
this iommufd_object_destroy_user(idev->ictx, &idev->obj).
> > OK. If user space forgot to destroy its vdev while unplugging the
> > device, it would not be allowed to hotplug another device (or the
> > same device) back to the same slot having the same RID, since the
> > RID on the vIOMMU would be occupied by the undestroyed vdev.
>
> Yes, that seems correct and obvious to me. Until the vdev is
> explicitly destroyed the ID is in-use.
>
> Good userspace should destroy the iommufd vDEVICE object before
> closing the VFIO file descriptor.
>
> If it doesn't, then the VDEVICE object remains even though the VFIO it
> was linked to is gone.
I see.
Thanks
Nicolin
Powered by blists - more mailing lists