linux-kernel - RE: [PATCH v5 4/5] iommu: Introduce iommu_dev_reset_prepare() and iommu_dev_reset

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <BN9PR11MB527683978D304128441125C68CCAA@BN9PR11MB5276.namprd11.prod.outlook.com>
Date: Fri, 14 Nov 2025 09:37:27 +0000
From: "Tian, Kevin" <kevin.tian@...el.com>
To: Nicolin Chen <nicolinc@...dia.com>, "joro@...tes.org" <joro@...tes.org>,
	"afael@...nel.org" <afael@...nel.org>, "bhelgaas@...gle.com"
	<bhelgaas@...gle.com>, "alex@...zbot.org" <alex@...zbot.org>,
	"jgg@...dia.com" <jgg@...dia.com>
CC: "will@...nel.org" <will@...nel.org>, "robin.murphy@....com"
	<robin.murphy@....com>, "lenb@...nel.org" <lenb@...nel.org>,
	"baolu.lu@...ux.intel.com" <baolu.lu@...ux.intel.com>,
	"linux-arm-kernel@...ts.infradead.org"
	<linux-arm-kernel@...ts.infradead.org>, "iommu@...ts.linux.dev"
	<iommu@...ts.linux.dev>, "linux-kernel@...r.kernel.org"
	<linux-kernel@...r.kernel.org>, "linux-acpi@...r.kernel.org"
	<linux-acpi@...r.kernel.org>, "linux-pci@...r.kernel.org"
	<linux-pci@...r.kernel.org>, "kvm@...r.kernel.org" <kvm@...r.kernel.org>,
	"patches@...ts.linux.dev" <patches@...ts.linux.dev>, "Jaroszynski, Piotr"
	<pjaroszynski@...dia.com>, "Sethi, Vikram" <vsethi@...dia.com>,
	"helgaas@...nel.org" <helgaas@...nel.org>, "etzhao1900@...il.com"
	<etzhao1900@...il.com>
Subject: RE: [PATCH v5 4/5] iommu: Introduce iommu_dev_reset_prepare() and
 iommu_dev_reset_done()

> From: Nicolin Chen <nicolinc@...dia.com>
> Sent: Tuesday, November 11, 2025 1:13 PM
> 
> Note that there are two corner cases:
>  1. Devices in the same iommu_group
>     Since an attachment is always per iommu_group, disallowing one device
>     to switch domains (or HWPTs in iommufd) would have to disallow others
>     in the same iommu_group to switch domains as well. So, play safe by
>     preventing a shared iommu_group from going through the iommu reset.

It'd be good to make clear that 'preventing' means that the racing problem
is not addressed.

> +	/*
> +	 * During a group device reset, @resetting_domain points to the
> physical
> +	 * domain, while @domain points to the attached domain before the
> reset.
> +	 */
> +	struct iommu_domain *resetting_domain;

'a group device' is a bit confusing. Just remove 'group'?

> @@ -2195,6 +2200,12 @@ int iommu_deferred_attach(struct device *dev,
> struct iommu_domain *domain)
> 
>  	guard(mutex)(&dev->iommu_group->mutex);
> 
> +	/*
> +	 * This is a concurrent attach while a group device is resetting. Reject
> +	 * it until iommu_dev_reset_done() attaches the device to group-
> >domain.
> +	 */
> +	if (dev->iommu_group->resetting_domain)
> +		return -EBUSY;

It might be worth noting that failing a deferred attach leads to failing
the dma map operation. It's different from other explicit attaching paths,
but there is nothing more we can do here.


> @@ -2253,6 +2264,16 @@ struct iommu_domain
> *iommu_driver_get_domain_for_dev(struct device *dev)
> 
>  	lockdep_assert_held(&group->mutex);
> 
> +	/*
> +	 * Driver handles the low-level __iommu_attach_device(), including
> the
> +	 * one invoked by iommu_dev_reset_done(), in which case the driver
> must
> +	 * get the resetting_domain over group->domain caching the one
> prior to
> +	 * iommu_dev_reset_prepare(), so that it wouldn't end up with
> attaching
> +	 * the device from group->domain (old) to group->domain (new).
> +	 */
> +	if (group->resetting_domain)
> +		return group->resetting_domain;

It's a pretty long sentence. Let's break it.

> +int iommu_dev_reset_prepare(struct device *dev)

If this is intended to be used by pci for now, it's clearer to have a 'pci'
word in the name. Later when there is a demand calling it from other
buses, discussion will catch eyes to ensure no racy of UAF etc.

> +	/*
> +	 * Once the resetting_domain is set, any concurrent attachment to
> this
> +	 * iommu_group will be rejected, which would break the attach
> routines
> +	 * of the sibling devices in the same iommu_group. So, skip this case.
> +	 */
> +	if (dev_is_pci(dev)) {
> +		struct group_device *gdev;
> +
> +		for_each_group_device(group, gdev) {
> +			if (gdev->dev != dev)
> +				return 0;
> +		}
> +	}

btw what'd be a real impact to reject concurrent attachment for sibling
devices? This series already documents the impact in uAPI for the device
under attachment, and the userspace already knows the restriction 
of devices in the group which must be attached to a same hwpt.

Combining those knowledge I don't think there is a problem for 
userspace to be aware of that resetting a device in a multi-dev
group affects concurrent attachment of sibling devices...

> +	/* Re-attach RID domain back to group->domain */
> +	if (group->domain != group->blocking_domain) {
> +		WARN_ON(__iommu_attach_device(group->domain, dev,
> +					      group->blocking_domain));
> +	}

Even if we disallow resetting on a multi-dev group, there is still a
corner case not taken care here.

It's possible that there is only one device in the group at prepare,
coming with a device hotplug added to the group in the middle,
then doing reset_done.

In this case the newly-added device will inherit the blocking domain.

Then reset_done should loop all devices in the group and re-attach
all of them to the cached domain.