[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <aRZBN/nBz6lqTzmt@Asurada-Nvidia>
Date: Thu, 13 Nov 2025 12:36:07 -0800
From: Nicolin Chen <nicolinc@...dia.com>
To: Suravee Suthikulpanit <suravee.suthikulpanit@....com>
CC: <jgg@...dia.com>, <linux-kernel@...r.kernel.org>, <robin.murphy@....com>,
<will@...nel.org>, <joro@...tes.org>, <kevin.tian@...el.com>,
<jsnitsel@...hat.com>, <vasant.hegde@....com>, <iommu@...ts.linux.dev>,
<santosh.shukla@....com>, <sairaj.arunkodilkar@....com>, <jon.grimm@....com>,
<prashanthpra@...gle.com>, <wvw@...gle.com>, <wnliu@...gle.com>,
<gptran@...gle.com>, <kpsingh@...gle.com>, <joao.m.martins@...cle.com>,
<alejandro.j.jimenez@...cle.com>
Subject: Re: [PATCH v5 11/14] iommu/amd: Introduce gDomID-to-hDomID Mapping
and handle parent domain invalidation
On Wed, Nov 12, 2025 at 06:25:03PM +0000, Suravee Suthikulpanit wrote:
> @@ -38,10 +40,42 @@ size_t amd_iommufd_get_viommu_size(struct device *dev, enum iommu_viommu_type vi
> int amd_iommufd_viommu_init(struct iommufd_viommu *viommu, struct iommu_domain *parent,
> const struct iommu_user_data *user_data)
> {
> + unsigned long flags;
> struct protection_domain *pdom = to_pdomain(parent);
> struct amd_iommu_viommu *aviommu = container_of(viommu, struct amd_iommu_viommu, core);
>
> + xa_init(&aviommu->gdomid_array);
Perhaps init with XA_FLAGS_ALLOC1 since domid can't be 0?
> +static void amd_iommufd_viommu_destroy(struct iommufd_viommu *viommu)
> +{
> + unsigned long flags;
> + struct amd_iommu_viommu *entry, *next;
> + struct amd_iommu_viommu *aviommu = container_of(viommu, struct amd_iommu_viommu, core);
> + struct protection_domain *pdom = aviommu->parent;
> +
> + spin_lock_irqsave(&pdom->lock, flags);
> + list_for_each_entry_safe(entry, next, &pdom->viommu_list, pdom_list) {
> + if (entry == aviommu)
> + list_del(&entry->pdom_list);
> + }
Do we really need the loop? Why not simply do list_del()?
> + spin_unlock_irqrestore(&pdom->lock, flags);
> +
> +}
No need of the extra line at the end of the function.
> @@ -92,7 +94,60 @@ amd_iommu_alloc_domain_nested(struct iommufd_viommu *viommu, u32 flags,
> ndom->domain.type = IOMMU_DOMAIN_NESTED;
> ndom->viommu = aviommu;
>
> + gdom_info = kzalloc(sizeof(*gdom_info), GFP_KERNEL);
> + if (!gdom_info)
> + goto out_err;
Missing:
ret = -ENOMEM;
> +
> + /*
> + * Normally, when a guest has multiple pass-through devices,
> + * the IOMMU driver setup DTEs with the same stage-2 table and
> + * use the same host domain ID (hDomId). In case of nested translation,
> + * if the guest setup different stage-1 tables with same PASID,
> + * IOMMU would use the same TLB tag. This will results in TLB
> + * aliasing issue.
> + *
> + * The guest is assigning gDomIDs based on its own algorithm for managing
> + * cache tags of (DomID, PASID). Within a single viommu, the nest parent domain
> + * (w/ S2 table) is used by all DTEs. But we need to consistently map the gDomID
> + * to a single hDomID. This is done using an xarray in the vIOMMU to
> + * keep track of the gDomID mapping. When the S2 is changed, the INVALIDATE_IOMMU_PAGES
> + * command must be issued for each hDomID in the xarray.
> + */
> + curr = xa_cmpxchg(&aviommu->gdomid_array,
> + ndom->gdom_id, NULL, gdom_info, GFP_ATOMIC);
> + if (curr) {
> + if (xa_err(curr)) {
> + ret = -EINVAL;
> + goto out_err_gdom_info;
> + } else {
> + /* The gDomID already exist */
> + pr_debug("%s: Found gdom_id=%#x, hdom_id=%#x\n",
> + __func__, ndom->gdom_id, curr->hdom_id);
> + refcount_inc(&curr->users);
> + ndom->gdom_info = curr;
This looks racy..
When a gDomID is shared between two nested domains, a concurrent
nested_domain_free() could enter before refcount_inc(), and call
refcount_dec_and_test() or even free the curr and ndom.
Then, this refcount_inc() will blow up, or curr/ndom will UAF.
Actually, I don't see where amd_iommu_alloc_domain_nested() gets
used in this series.. I assume AMD will use the iommufd's vIOMMU
infrastructure directly which doesn't mutex across nested domain
allocation/free calls.
So, the entire thing here should hold xa_lock(), use xas_load()
for the existing curr and use xas_store() to store gdom_info if
!curr, and xa_unlock() after gdom_info is fully initialized.
> + kfree(gdom_info);
> + return &ndom->domain;
> + }
> + }
> +
> + /* The gDomID does not exist. We allocate new hdom_id */
> + gdom_info->hdom_id = amd_iommu_pdom_id_alloc();
> + if (gdom_info->hdom_id <= 0) {
> + xa_cmpxchg(&aviommu->gdomid_array,
> + ndom->gdom_id, gdom_info, NULL, GFP_ATOMIC);
> + ret = -ENOSPC;
> + goto out_err_gdom_info;
> + }
> +
> + refcount_set(&gdom_info->users, 1);
Similar risk here. gdom_info is stored to the xarray before this
line. A concurrent amd_iommu_alloc_domain_nested() could get the
stored gdom_info and blow up at refcount_inc().
Make sure the entire thing is locked and safe.
Nicolin
Powered by blists - more mailing lists