linux-kernel - Re: [PATCH v5 14/29] iommufd/viommu: Add IOMMUFD_CMD_HW_QUEUE

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20250530161455.GE233377@nvidia.com>
Date: Fri, 30 May 2025 13:14:55 -0300
From: Jason Gunthorpe <jgg@...dia.com>
To: Nicolin Chen <nicolinc@...dia.com>
Cc: kevin.tian@...el.com, corbet@....net, will@...nel.org,
	bagasdotme@...il.com, robin.murphy@....com, joro@...tes.org,
	thierry.reding@...il.com, vdumpa@...dia.com, jonathanh@...dia.com,
	shuah@...nel.org, jsnitsel@...hat.com, nathan@...nel.org,
	peterz@...radead.org, yi.l.liu@...el.com, mshavit@...gle.com,
	praan@...gle.com, zhangzekun11@...wei.com, iommu@...ts.linux.dev,
	linux-doc@...r.kernel.org, linux-kernel@...r.kernel.org,
	linux-arm-kernel@...ts.infradead.org, linux-tegra@...r.kernel.org,
	linux-kselftest@...r.kernel.org, patches@...ts.linux.dev,
	mochs@...dia.com, alok.a.tiwari@...cle.com, vasant.hegde@....com,
	dwmw2@...radead.org, baolu.lu@...ux.intel.com
Subject: Re: [PATCH v5 14/29] iommufd/viommu: Add IOMMUFD_CMD_HW_QUEUE_ALLOC
 ioctl

On Sat, May 17, 2025 at 08:21:31PM -0700, Nicolin Chen wrote:
> +int iommufd_hw_queue_alloc_ioctl(struct iommufd_ucmd *ucmd)
> +{
> +	struct iommu_hw_queue_alloc *cmd = ucmd->cmd;
> +	struct iommufd_hw_queue *hw_queue;
> +	struct iommufd_viommu *viommu;
> +	struct page **pages;
> +	int max_npages, i;
> +	u64 last, offset;
> +	int rc;
> +
> +	if (cmd->flags || cmd->type == IOMMU_HW_QUEUE_TYPE_DEFAULT)
> +		return -EOPNOTSUPP;
> +	if (!cmd->nesting_parent_iova || !cmd->length)
> +		return -EINVAL;

0 is a legal nesting_parent_iova

> +	if (check_add_overflow(cmd->nesting_parent_iova, cmd->length - 1,
> +			       &last))
> +		return -EOVERFLOW;
> +
> +	viommu = iommufd_get_viommu(ucmd, cmd->viommu_id);
> +	if (IS_ERR(viommu))
> +		return PTR_ERR(viommu);
> +
> +	if (!viommu->ops || !viommu->ops->hw_queue_alloc) {
> +		rc = -EOPNOTSUPP;
> +		goto out_put_viommu;
> +	}
> +
> +	offset =
> +		cmd->nesting_parent_iova - PAGE_ALIGN(cmd->nesting_parent_iova);
> +	max_npages = DIV_ROUND_UP(offset + cmd->length, PAGE_SIZE);

This should probably be capped to PAGE_SIZE/sizeof(void *), return
EINVAL if not

And we don't need the allocation if we are not doing an access.

> +	hw_queue = viommu->ops->hw_queue_alloc(ucmd, viommu, cmd->type,
> +					       cmd->index,
> +					       cmd->nesting_parent_iova,
> +					       cmd->length);
> +	if (IS_ERR(hw_queue)) {
> +		rc = PTR_ERR(hw_queue);
> +		goto out_unpin;
> +	}
> +
> +	/* The iommufd_hw_queue_alloc helper saves ictx in hw_queue->ictx */
> +	if (WARN_ON_ONCE(hw_queue->ictx != ucmd->ictx)) {
> +		rc = -EINVAL;
> +		goto out_unpin;
> +	}

There is another technique from RDMA which may actually be very
helpful here considering how things are getting split up..

Put a size_t in the driver ops:

 size_t size_viommu;
 size_t size_hw_queue;

Have the driver set it via a macro like INIT_RDMA_OBJ_SIZE

#define INIT_RDMA_OBJ_SIZE(ib_struct, drv_struct, member)                      \
	.size_##ib_struct =                                                    \
		(sizeof(struct drv_struct) +                                   \
		 BUILD_BUG_ON_ZERO(offsetof(struct drv_struct, member)) +      \
		 BUILD_BUG_ON_ZERO(                                            \
			 !__same_type(((struct drv_struct *)NULL)->member,     \
				      struct ib_struct)))

Which proves the core structure is at the front.

Then the core code can allocate the object along with enough space for
the driver and call a driver function to init the driver portion of
the already allocated object.

Then you don't need these dances where the driver helper has to do
things like set uctx, or the core structure is partially initialized:

> +	hw_queue->viommu = viommu;
> +	refcount_inc(&viommu->obj.users);
> +	hw_queue->length = cmd->length;
> +	hw_queue->base_addr = cmd->nesting_parent_iova;

When the driver is running, which can be a source of bugs.

This would be useful for the existing ops too.

May reduce the size of the linked in code.

Jason