lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <aRZBN/nBz6lqTzmt@Asurada-Nvidia>
Date: Thu, 13 Nov 2025 12:36:07 -0800
From: Nicolin Chen <nicolinc@...dia.com>
To: Suravee Suthikulpanit <suravee.suthikulpanit@....com>
CC: <jgg@...dia.com>, <linux-kernel@...r.kernel.org>, <robin.murphy@....com>,
	<will@...nel.org>, <joro@...tes.org>, <kevin.tian@...el.com>,
	<jsnitsel@...hat.com>, <vasant.hegde@....com>, <iommu@...ts.linux.dev>,
	<santosh.shukla@....com>, <sairaj.arunkodilkar@....com>, <jon.grimm@....com>,
	<prashanthpra@...gle.com>, <wvw@...gle.com>, <wnliu@...gle.com>,
	<gptran@...gle.com>, <kpsingh@...gle.com>, <joao.m.martins@...cle.com>,
	<alejandro.j.jimenez@...cle.com>
Subject: Re: [PATCH v5 11/14] iommu/amd: Introduce gDomID-to-hDomID Mapping
 and handle parent domain invalidation

On Wed, Nov 12, 2025 at 06:25:03PM +0000, Suravee Suthikulpanit wrote:
> @@ -38,10 +40,42 @@ size_t amd_iommufd_get_viommu_size(struct device *dev, enum iommu_viommu_type vi
>  int amd_iommufd_viommu_init(struct iommufd_viommu *viommu, struct iommu_domain *parent,
>  			    const struct iommu_user_data *user_data)
>  {
> +	unsigned long flags;
>  	struct protection_domain *pdom = to_pdomain(parent);
>  	struct amd_iommu_viommu *aviommu = container_of(viommu, struct amd_iommu_viommu, core);
>  
> +	xa_init(&aviommu->gdomid_array);

Perhaps init with XA_FLAGS_ALLOC1 since domid can't be 0?

> +static void amd_iommufd_viommu_destroy(struct iommufd_viommu *viommu)
> +{
> +	unsigned long flags;
> +	struct amd_iommu_viommu *entry, *next;
> +	struct amd_iommu_viommu *aviommu = container_of(viommu, struct amd_iommu_viommu, core);
> +	struct protection_domain *pdom = aviommu->parent;
> +
> +	spin_lock_irqsave(&pdom->lock, flags);
> +	list_for_each_entry_safe(entry, next, &pdom->viommu_list, pdom_list) {
> +		if (entry == aviommu)
> +			list_del(&entry->pdom_list);
> +	}

Do we really need the loop? Why not simply do list_del()?

> +	spin_unlock_irqrestore(&pdom->lock, flags);
> +
> +}

No need of the extra line at the end of the function.

> @@ -92,7 +94,60 @@ amd_iommu_alloc_domain_nested(struct iommufd_viommu *viommu, u32 flags,
>  	ndom->domain.type = IOMMU_DOMAIN_NESTED;
>  	ndom->viommu = aviommu;
>  
> +	gdom_info = kzalloc(sizeof(*gdom_info), GFP_KERNEL);
> +	if (!gdom_info)
> +		goto out_err;

Missing:
	ret = -ENOMEM;

> +
> +	/*
> +	 * Normally, when a guest has multiple pass-through devices,
> +	 * the IOMMU driver setup DTEs with the same stage-2 table and
> +	 * use the same host domain ID (hDomId). In case of nested translation,
> +	 * if the guest setup different stage-1 tables with same PASID,
> +	 * IOMMU would use the same TLB tag. This will results in TLB
> +	 * aliasing issue.
> +	 *
> +	 * The guest is assigning gDomIDs based on its own algorithm for managing
> +	 * cache tags of (DomID, PASID). Within a single viommu, the nest parent domain
> +	 * (w/ S2 table) is used by all DTEs. But we need to consistently map the gDomID
> +	 * to a single hDomID. This is done using an xarray in the vIOMMU to
> +	 * keep track of the gDomID mapping. When the S2 is changed, the INVALIDATE_IOMMU_PAGES
> +	 * command must be issued for each hDomID in the xarray.
> +	 */
> +	curr = xa_cmpxchg(&aviommu->gdomid_array,
> +			  ndom->gdom_id, NULL, gdom_info, GFP_ATOMIC);
> +	if (curr) {
> +		if (xa_err(curr)) {
> +			ret = -EINVAL;
> +			goto out_err_gdom_info;
> +		} else {
> +			/* The gDomID already exist */
> +			pr_debug("%s: Found gdom_id=%#x, hdom_id=%#x\n",
> +				 __func__, ndom->gdom_id, curr->hdom_id);
> +			refcount_inc(&curr->users);
> +			ndom->gdom_info = curr;

This looks racy..

When a gDomID is shared between two nested domains, a concurrent
nested_domain_free() could enter before refcount_inc(), and call
refcount_dec_and_test() or even free the curr and ndom.

Then, this refcount_inc() will blow up, or curr/ndom will UAF.

Actually, I don't see where amd_iommu_alloc_domain_nested() gets
used in this series.. I assume AMD will use the iommufd's vIOMMU
infrastructure directly which doesn't mutex across nested domain
allocation/free calls.

So, the entire thing here should hold xa_lock(), use xas_load()
for the existing curr and use xas_store() to store gdom_info if
!curr, and xa_unlock() after gdom_info is fully initialized.

> +			kfree(gdom_info);
> +			return &ndom->domain;
> +		}
> +	}
> +
> +	/* The gDomID does not exist. We allocate new hdom_id */
> +	gdom_info->hdom_id = amd_iommu_pdom_id_alloc();
> +	if (gdom_info->hdom_id <= 0) {
> +		xa_cmpxchg(&aviommu->gdomid_array,
> +			   ndom->gdom_id, gdom_info, NULL, GFP_ATOMIC);
> +		ret = -ENOSPC;
> +		goto out_err_gdom_info;
> +	}
> +
> +	refcount_set(&gdom_info->users, 1);

Similar risk here. gdom_info is stored to the xarray before this
line. A concurrent amd_iommu_alloc_domain_nested() could get the
stored gdom_info and blow up at refcount_inc().

Make sure the entire thing is locked and safe.

Nicolin

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ