linux-kernel - RE: [PATCH v2 3/4] iommufd: Destroy vdevice on idevice destroy

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <BN9PR11MB5276A6F54C0391F72F3CFD7D8C7BA@BN9PR11MB5276.namprd11.prod.outlook.com>
Date: Wed, 25 Jun 2025 02:11:40 +0000
From: "Tian, Kevin" <kevin.tian@...el.com>
To: Jason Gunthorpe <jgg@...dia.com>
CC: Xu Yilun <yilun.xu@...ux.intel.com>, "will@...nel.org" <will@...nel.org>,
	"aneesh.kumar@...nel.org" <aneesh.kumar@...nel.org>, "iommu@...ts.linux.dev"
	<iommu@...ts.linux.dev>, "linux-kernel@...r.kernel.org"
	<linux-kernel@...r.kernel.org>, "joro@...tes.org" <joro@...tes.org>,
	"robin.murphy@....com" <robin.murphy@....com>, "shuah@...nel.org"
	<shuah@...nel.org>, "nicolinc@...dia.com" <nicolinc@...dia.com>,
	"aik@....com" <aik@....com>, "Williams, Dan J" <dan.j.williams@...el.com>,
	"baolu.lu@...ux.intel.com" <baolu.lu@...ux.intel.com>, "Xu, Yilun"
	<yilun.xu@...el.com>
Subject: RE: [PATCH v2 3/4] iommufd: Destroy vdevice on idevice destroy

> From: Jason Gunthorpe <jgg@...dia.com>
> Sent: Wednesday, June 25, 2025 9:36 AM
> 
> On Tue, Jun 24, 2025 at 11:57:31PM +0000, Tian, Kevin wrote:
> > > From: Jason Gunthorpe <jgg@...dia.com>
> > > Sent: Tuesday, June 24, 2025 10:54 PM
> > >
> > > On Mon, Jun 23, 2025 at 05:49:45PM +0800, Xu Yilun wrote:
> > > > +static void iommufd_device_remove_vdev(struct iommufd_device
> *idev)
> > > > +{
> > > > +	bool vdev_removing = false;
> > > > +
> > > > +	mutex_lock(&idev->igroup->lock);
> > > > +	if (idev->vdev) {
> > > > +		struct iommufd_vdevice *vdev;
> > > > +
> > > > +		vdev = iommufd_get_vdevice(idev->ictx, idev->vdev->obj.id);
> > > > +		if (IS_ERR(vdev)) {
> > >
> > > This incrs obj.users which will cause a concurrent
> > > iommufd_object_remove() to fail with -EBUSY, which we are trying to
> > > avoid.
> >
> > concurrent remove means a user-initiated IOMMU_DESTROY, for which
> > failing with -EBUSY is expected as it doesn't wait for shortterm?
> 
> Yes a user IOMMU_DESTROY of the vdevice should not have a transient
> EBUSY failure. Avoiding that is the purpose of the shorterm_users
> mechanism.

hmm my understanding is the opposite.

currently iommufd_destroy() doesn't set REMOVE_WAIT_SHORTTERM:

static int iommufd_destroy(struct iommufd_ucmd *ucmd)
{
	struct iommu_destroy *cmd = ucmd->cmd;

	return iommufd_object_remove(ucmd->ictx, NULL, cmd->id, 0);
}

so it's natural for IOMMU_DESTROY to hit transient EBUSY when a parallel
ioctl is being executed on the destroyed object:

	if (!refcount_dec_if_one(&obj->users)) {
		ret = -EBUSY;
		goto err_xa;
	}

idevice unbind is just a similar (but indirect) transient race to 
IOMMU_DESTROY.

waiting shorterm_users is more for kernel destroy.

> 
> > > Also you can hit a race where the tombstone has NULL'd the entry but
> > > the racing destroy will then load the NULL with xas_load() and hit this:
> > >
> > > 		if (WARN_ON(obj != to_destroy)) {
> >
> > IOMMU_DESTROY doesn't provide to_destroy.
> 
> Right, but IOMMU_DESTROY thread could have already gone past the
> xa_store(NULL) and then the kernel destroy thread could reach the
> above WARN as it does use to_destroy.
> 

If IOMMU_DESTROY have already gone past xa_store(NULL), there are
two scenarios:

1) vdevice has been completely destroyed with idev->vdev=NULL.

In such case iommufd_device_remove_vdev() is nop.

2) vdevice destroy has not been completed with idev->vdev still valid

In such case iommufd_get_vdevice() fails with vdev_removing set.

Then iommufd_device_remove_vdev() will wait on idev->vdev to
be NULL instead of calling iommufd_object_tombstone_user().

so the said race won't happen. 😊