[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250624145346.GC150753@nvidia.com>
Date: Tue, 24 Jun 2025 11:53:46 -0300
From: Jason Gunthorpe <jgg@...dia.com>
To: Xu Yilun <yilun.xu@...ux.intel.com>
Cc: kevin.tian@...el.com, will@...nel.org, aneesh.kumar@...nel.org,
iommu@...ts.linux.dev, linux-kernel@...r.kernel.org,
joro@...tes.org, robin.murphy@....com, shuah@...nel.org,
nicolinc@...dia.com, aik@....com, dan.j.williams@...el.com,
baolu.lu@...ux.intel.com, yilun.xu@...el.com
Subject: Re: [PATCH v2 3/4] iommufd: Destroy vdevice on idevice destroy
On Mon, Jun 23, 2025 at 05:49:45PM +0800, Xu Yilun wrote:
> +static void iommufd_device_remove_vdev(struct iommufd_device *idev)
> +{
> + bool vdev_removing = false;
> +
> + mutex_lock(&idev->igroup->lock);
> + if (idev->vdev) {
> + struct iommufd_vdevice *vdev;
> +
> + vdev = iommufd_get_vdevice(idev->ictx, idev->vdev->obj.id);
> + if (IS_ERR(vdev)) {
This incrs obj.users which will cause a concurrent
iommufd_object_remove() to fail with -EBUSY, which we are trying to
avoid.
Also you can hit a race where the tombstone has NULL'd the entry but
the racing destroy will then load the NULL with xas_load() and hit this:
if (WARN_ON(obj != to_destroy)) {
So, this doesn't look like it will work right to me..
You want somewhat different destroy logic:
/*
* The caller must directly obtain a shortterm_users reference without a users
* reference using its own locking to protect the pointer. This function always
* puts back the shortterm_users reference.
*/
int iommufd_object_remove_tombstone(struct iommufd_ctx *ictx,
struct iommufd_object *to_destroy)
{
XA_STATE(xas, &ictx->objects, to_destroy->id);
struct iommufd_object *obj;
int ret;
xa_lock(&ictx->objects);
obj = xas_load(&xas);
if (xa_is_zero(obj) || obj == NULL) {
/*
* Another thread is racing to destroy this, since we have the
* shortterm_users refcount the other thread has xa_unlocked()
* but not passed iommufd_object_dec_wait_shortterm().
*/
if (refcount_dec_and_test(&to_destroy->shortterm_users))
wake_up_interruptible_all(&ictx->destroy_wait);
ret = 0;
goto err_xa;
} else if (WARN_ON(obj != to_destroy)) {
refcount_dec(&obj->shortterm_users);
ret = -ENOENT;
goto err_xa;
}
/*
* The object is still in the xarray, so this thread will try to destroy
* it. Put back the callers shortterm_users.
*/
refcount_dec(&obj->shortterm_users);
if (!refcount_dec_if_one(&obj->users)) {
ret = -EBUSY;
goto err_xa;
}
/* Leave behind a tombstone to prevent re-use of this entry */
xas_store(&xas, XA_ZERO_ENTRY);
xa_unlock(&ictx->objects);
/*
* Since users is zero any positive users_shortterm must be racing
* iommufd_put_object(), or we have a bug.
*/
ret = iommufd_object_dec_wait_shortterm(ictx, obj);
if (WARN_ON(ret))
return ret;
iommufd_object_ops[obj->type].destroy(obj);
kfree(obj);
return 0;
err_xa:
xa_unlock(&ictx->objects);
/* The returned object reference count is zero */
return ret;
}
Then you'd call it by doing something like:
static void iommufd_device_remove_vdev(struct iommufd_device *idev)
{
struct iommufd_object *to_destroy = NULL;
int ret;
mutex_lock(&idev->igroup->lock);
if (!idev->vdev) {
mutex_unlock(&idev->igroup->lock);
return;
}
if (refcount_inc_not_zero(&idev->vdev->obj.shortterm_users))
to_destroy = &idev->vdev->obj;
mutex_unlock(&idev->igroup->lock);
if (to_destroy) {
ret = iommufd_object_remove_tombstone(idev->ictx, to_destroy);
if (WARN_ON(ret))
return;
}
/*
* We don't know what thread is actually going to destroy the vdev, but
* once the vdev is destroyed the pointer is NULL'd. At this
* point idev->users is 0 so no other thread can set a new vdev.
*/
if (!wait_event_timeout(idev->ictx->destroy_wait,
!READ_ONCE(idev->vdev),
msecs_to_jiffies(60000)))
pr_crit("Time out waiting for iommufd vdevice removed\n");
}
Though there is a cleaner option here, you could do:
mutex_lock(&idev->igroup->lock);
if (idev->vdev)
iommufd_vdevice_abort(&idev->vdev->obj);
mutex_unlock(&idev->igroup->lock);
And make it safe to call abort twice, eg by setting dev to NULL and
checking for that. First thread to get to the igroup lock, either via
iommufd_vdevice_destroy() or via the above will do the actual abort
synchronously without any wait_event_timeout. That seems better??
> + /* vdev can't outlive idev, vdev->idev is always valid, need no refcnt */
> + vdev->idev = idev;
So this means a soon as 'idev->vdev = NULL;' happens idev is an
invalid pointer. Need a WRITE_ONCE there.
I would rephrase the comment as
iommufd_device_destroy() waits until idev->vdev is NULL before
freeing the idev, which only happens once the vdev is finished
destruction. Thus we do not need refcounting on either idev->vdev or
vdev->idev.
and group both assignments together.
> vdev->ictx = ucmd->ictx;
> vdev->id = virt_id;
> vdev->dev = idev->dev;
> get_device(idev->dev);
> vdev->viommu = viommu;
> refcount_inc(&viommu->obj.users);
> + /* idev->vdev is protected by idev->igroup->lock, need no refcnt */
> + idev->vdev = vdev;
This can be WRITE_ONCE too
Jason
Powered by blists - more mailing lists