linux-kernel - Re: [PATCH v3 3/5] iommu: Add verisilicon IOMMU driver

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <073ffe14-d631-4a4f-8668-ddeb7d611448@collabora.com>
Date: Thu, 19 Jun 2025 18:27:52 +0200
From: Benjamin Gaignard <benjamin.gaignard@...labora.com>
To: Jason Gunthorpe <jgg@...pe.ca>
Cc: joro@...tes.org, will@...nel.org, robin.murphy@....com, robh@...nel.org,
 krzk+dt@...nel.org, conor+dt@...nel.org, heiko@...ech.de,
 nicolas.dufresne@...labora.com, iommu@...ts.linux.dev,
 devicetree@...r.kernel.org, linux-kernel@...r.kernel.org,
 linux-arm-kernel@...ts.infradead.org, linux-rockchip@...ts.infradead.org,
 kernel@...labora.com
Subject: Re: [PATCH v3 3/5] iommu: Add verisilicon IOMMU driver


Le 19/06/2025 à 15:47, Jason Gunthorpe a écrit :
> On Thu, Jun 19, 2025 at 03:12:24PM +0200, Benjamin Gaignard wrote:
>
>> +static struct iommu_domain *vsi_iommu_domain_alloc_paging(struct device *dev)
>> +{
>> +	struct vsi_iommu *iommu = vsi_iommu_get_from_dev(dev);
>> +	struct vsi_iommu_domain *vsi_domain;
>> +
>> +	vsi_domain = kzalloc(sizeof(*vsi_domain), GFP_KERNEL);
>> +	if (!vsi_domain)
>> +		return NULL;
>> +
>> +	vsi_domain->dma_dev = iommu->dev;
>> +	iommu->domain = &vsi_identity_domain;
> ?? alloc paging should not change the iommu.
>
> Probably this belongs in vsi_iommu_probe_device if the device starts
> up in an identity translation mode.

Your are right it useless here, I will remove it.

>
>> +static u32 *vsi_dte_get_page_table(struct vsi_iommu_domain *vsi_domain, dma_addr_t iova)
>> +{
>> +	u32 *page_table, *dte_addr;
>> +	u32 dte_index, dte;
>> +	phys_addr_t pt_phys;
>> +	dma_addr_t pt_dma;
>> +
>> +	assert_spin_locked(&vsi_domain->dt_lock);
>> +
>> +	dte_index = vsi_iova_dte_index(iova);
>> +	dte_addr = &vsi_domain->dt[dte_index];
>> +	dte = *dte_addr;
>> +	if (vsi_dte_is_pt_valid(dte))
>> +		goto done;
>> +
>> +	page_table = (u32 *)iommu_alloc_pages_sz(GFP_ATOMIC | GFP_DMA32, SPAGE_SIZE);
> Unnecessary casts are not the kernel style, I saw a couple others too
>
> Ugh. This ignores the gfp flags that are passed into map because you
> have to force atomic due to the spinlock that shouldn't be there :(
> This means it does not set GFP_KERNEL_ACCOUNT when required. It would
> be better to continue to use the passed in GFP flags but override them
> to atomic mode.

I will add a gfp_t parameter and use it like that:
page_table = iommu_alloc_pages_sz(gfp | GFP_ATOMIC | GFP_DMA32, SPAGE_SIZE);

>
>> +static int vsi_iommu_identity_attach(struct iommu_domain *domain,
>> +				     struct device *dev)
>> +{
>> +	struct vsi_iommu *iommu = dev_iommu_priv_get(dev);
>> +	struct vsi_iommu_domain *vsi_domain = to_vsi_domain(domain);
>> +	unsigned long flags;
>> +	int ret;
>> +
>> +	if (WARN_ON(!iommu))
>> +		return -ENODEV;
> These WARN_ON's should be removed. ops are never called by the core
> without a probed device.

ok

>
>> +static int vsi_iommu_attach_device(struct iommu_domain *domain,
>> +				   struct device *dev)
>> +{
>> +	struct vsi_iommu *iommu = dev_iommu_priv_get(dev);
>> +	struct vsi_iommu_domain *vsi_domain = to_vsi_domain(domain);
>> +	unsigned long flags;
>> +	int ret;
>> +
>> +	if (WARN_ON(!iommu))
>> +		return -ENODEV;
>> +
>> +	/* iommu already attached */
>> +	if (iommu->domain == domain)
>> +		return 0;
>> +
>> +	ret = vsi_iommu_identity_attach(&vsi_identity_domain, dev);
>> +	if (ret)
>> +		return ret;
> Hurm, this is actually quite bad, now that it is clear the HW is in an
> identity mode it is actually a security problem for VFIO to switch the
> translation to identity during attach_device. I'd really prefer new
> drivers don't make this mistake.
>
> It seems the main thing motivating this is the fact a linked list has
> only a single iommu->node so you can't attach the iommu to both the
> new/old domain and atomically update the page table base.
>
> Is it possible for the HW to do a blocking behavior? That would be an
> easy fix.. You should always be able to force this by allocating a
> shared top page table level during probe time and making it entirely
> empty while staying always in the paging mode. Maybe there is a less
> expensive way.
>
> Otherwise you probably have work more like the other drivers and
> allocate a struct for each attachment so you can have the iommu
> attached two domains during the switch over and never drop to an
> identity mode.

I will remove the switch to identity domain and it will works fine.

>
>> +	iommu->domain = domain;
>> +
>> +	spin_lock_irqsave(&vsi_domain->iommus_lock, flags);
>> +	list_add_tail(&iommu->node, &vsi_domain->iommus);
>> +	spin_unlock_irqrestore(&vsi_domain->iommus_lock, flags);
>> +
>> +	ret = pm_runtime_get_if_in_use(iommu->dev);
>> +	if (!ret || WARN_ON_ONCE(ret < 0))
>> +		return 0;
> This probably should have a comment, is the idea the resume will setup
> the domain? How does locking of iommu->domain work in that case?
>
> Maybe the suspend resume paths should be holding the group mutex..
>
>> +	ret = vsi_iommu_enable(iommu);
>> +	if (ret)
>> +		WARN_ON(vsi_iommu_identity_attach(&vsi_identity_domain, dev));
> Is this necessary though? vsi_iommu_enable failure cases don't change
> the HW, and a few lines above was an identity_attach. Just delay
> setting iommu->domain until it succeeds, and this is a simple error.

I think I will change vsi_iommu_enable() prototype to:
static int vsi_iommu_enable(struct vsi_iommu *iommu, struct iommu_domain *domain)
and do iommu->domain = domain; at the end of the function if everything goes correctly.


> iommu->domain = domain;
>
>
>> +static struct iommu_ops vsi_iommu_ops = {
>> +	.identity_domain = &vsi_identity_domain,
> Add:
>
>    .release_domain = &vsi_identity_domain,
>
> Which will cause the core code to automatically run through to
> vsi_iommu_disable() prior to calling vsi_iommu_release_device(), which
> will avoid UAF problems.
>
> Also, should the probe functions be doing some kind of validation that
> there is only one struct device attached?

which kind of validation ?

>
> Jason