linux-kernel - Re: [PATCH v1 02/20] iommu: Introduce a test

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <aQO//+6/B/WbdK2h@Asurada-Nvidia>
Date: Thu, 30 Oct 2025 12:43:59 -0700
From: Nicolin Chen <nicolinc@...dia.com>
To: "Tian, Kevin" <kevin.tian@...el.com>
CC: "joro@...tes.org" <joro@...tes.org>, "jgg@...dia.com" <jgg@...dia.com>,
	"suravee.suthikulpanit@....com" <suravee.suthikulpanit@....com>,
	"will@...nel.org" <will@...nel.org>, "robin.murphy@....com"
	<robin.murphy@....com>, "sven@...nel.org" <sven@...nel.org>, "j@...nau.net"
	<j@...nau.net>, "jean-philippe@...aro.org" <jean-philippe@...aro.org>,
	"robin.clark@....qualcomm.com" <robin.clark@....qualcomm.com>,
	"dwmw2@...radead.org" <dwmw2@...radead.org>, "baolu.lu@...ux.intel.com"
	<baolu.lu@...ux.intel.com>, "yong.wu@...iatek.com" <yong.wu@...iatek.com>,
	"matthias.bgg@...il.com" <matthias.bgg@...il.com>,
	"angelogioacchino.delregno@...labora.com"
	<angelogioacchino.delregno@...labora.com>, "tjeznach@...osinc.com"
	<tjeznach@...osinc.com>, "pjw@...nel.org" <pjw@...nel.org>,
	"palmer@...belt.com" <palmer@...belt.com>, "aou@...s.berkeley.edu"
	<aou@...s.berkeley.edu>, "heiko@...ech.de" <heiko@...ech.de>,
	"schnelle@...ux.ibm.com" <schnelle@...ux.ibm.com>, "mjrosato@...ux.ibm.com"
	<mjrosato@...ux.ibm.com>, "wens@...e.org" <wens@...e.org>,
	"jernej.skrabec@...il.com" <jernej.skrabec@...il.com>, "samuel@...lland.org"
	<samuel@...lland.org>, "thierry.reding@...il.com" <thierry.reding@...il.com>,
	"jonathanh@...dia.com" <jonathanh@...dia.com>, "iommu@...ts.linux.dev"
	<iommu@...ts.linux.dev>, "linux-kernel@...r.kernel.org"
	<linux-kernel@...r.kernel.org>, "asahi@...ts.linux.dev"
	<asahi@...ts.linux.dev>, "linux-arm-kernel@...ts.infradead.org"
	<linux-arm-kernel@...ts.infradead.org>, "linux-arm-msm@...r.kernel.org"
	<linux-arm-msm@...r.kernel.org>, "linux-mediatek@...ts.infradead.org"
	<linux-mediatek@...ts.infradead.org>, "linux-riscv@...ts.infradead.org"
	<linux-riscv@...ts.infradead.org>, "linux-rockchip@...ts.infradead.org"
	<linux-rockchip@...ts.infradead.org>, "linux-s390@...r.kernel.org"
	<linux-s390@...r.kernel.org>, "linux-sunxi@...ts.linux.dev"
	<linux-sunxi@...ts.linux.dev>, "linux-tegra@...r.kernel.org"
	<linux-tegra@...r.kernel.org>, "virtualization@...ts.linux.dev"
	<virtualization@...ts.linux.dev>, "patches@...ts.linux.dev"
	<patches@...ts.linux.dev>
Subject: Re: [PATCH v1 02/20] iommu: Introduce a test_dev domain op and an
 internal helper

On Thu, Oct 30, 2025 at 08:47:18AM +0000, Tian, Kevin wrote:
> It might need more work to meet this requirement. e.g. after patch4
> I could still spot other errors easily in the attach path:
> 
> intel_iommu_attach_device()
>   iopf_for_domain_set()
>     intel_iommu_enable_iopf():
> 
>         if (!info->pri_enabled)
>                 return -ENODEV;

Yea, I missed that.

> intel_iommu_attach_device()
>   dmar_domain_attach_device()
>     domain_attach_iommu():
>       
>        curr = xa_cmpxchg(&domain->iommu_array, iommu->seq_id,
>                           NULL, info, GFP_KERNEL);
>         if (curr) {
>                 ret = xa_err(curr) ? : -EBUSY;
>                 goto err_clear;
>         }

There is actually an xa_load() in this function:

	curr = xa_load(&domain->iommu_array, iommu->seq_id);
	if (curr) {
		curr->refcnt++;
		kfree(info);
		return 0;
	}

	[...]

	info->refcnt	= 1;
	info->did	= num;
	info->iommu	= iommu;
	curr = xa_cmpxchg(&domain->iommu_array, iommu->seq_id,
			  NULL, info, GFP_KERNEL);
	if (curr) {
		ret = xa_err(curr) ? : -EBUSY;
		goto err_clear;
	}

It seems that this xa_cmpxchg could be just xa_store()?

> intel_iommu_attach_device()
>   dmar_domain_attach_device()
>     domain_setup_first_level()
>       __domain_setup_first_level()
>         intel_pasid_setup_first_level():

Yea. There are a few others in the track also..

>         pte = intel_pasid_get_entry(dev, pasid);
>         if (!pte) {
>                 spin_unlock(&iommu->lock);
>                 return -ENODEV;
>         }
> 
>         if (pasid_pte_is_present(pte)) {
>                 spin_unlock(&iommu->lock);
>                 return -EBUSY;
>         }

Hmm, this is fenced by iommu->lock and can race with !attach_dev
callbacks. It might be difficult to shift these to test_dev..

> On the other hand, how do we communicate whatever errors returned
> by attach_dev in the reset_done path back to userspace? As noted above
> resource allocation failures could still occur in attach_dev, but userspace
> may think the requested attach in middle of a reset has succeeded as
> long as it passes the test_dev check.

That's a legit point. Jason pointed out that we would end up with
some inconsistency between driver and core as well, at the SMMUv3
patch. So, this test_dev doesn't seemingly solve our problem very
well..

> Does it work better to block the attaching process upon ongoing reset
> and wake it up later upon reset_done to resume attach?

Yea, I think returning -EBUSY would be the simplest solution like
we did in the previous version.

But the concern is that VF might not be aware of a PF reset, so it
can still race an attachment, which would be -EBUSY as well. Then,
if its driver doesn't retry/defer the attach, this might break it?

FWIW, I am thinking of another design based on Jason's remarks:
https://lore.kernel.org/linux-iommu/aQBopHFub8wyQh5C@Asurada-Nvidia/

So, instead of core initiating the round trip between the blocking
domain and group->domain, it forwards dev_reset_prepare/done to the
driver where it does a low-level attachment that wouldn't fail:
  For SMMUv3, it's an STE update.
  For intel_iommu, it seems to be the context table update?

Then, any concurrent would be allowed to carry on to go through all
the compatibility/sanity checks as usual, but it would bypass the
final step: STE or context table update.

Thanks
Nicolin