[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <fwtqfdk3m7qrazj4bfutl4grac46agtxztc3p2lqnejt2wyexu@lztyomxrm3pk>
Date: Tue, 3 Feb 2026 14:43:53 +0100
From: Jörg Rödel <joro@...tes.org>
To: Magnus Kalland <magnus@...phinics.com>
Cc: vasant.hegde@....com, suravee.suthikulpanit@....com,
iommu@...ts.linux.dev, linux-kernel@...r.kernel.org,
"Tore H . Larsen" <torel@...ula.no>, "Lars B . Kristiansen" <larsk@...phinics.com>,
Jonas Markussen <jonas@...phinics.com>
Subject: Re: [PATCH v1] iommu/amd: IRT cache incoherency bug
On Tue, Feb 03, 2026 at 12:32:10PM +0100, Magnus Kalland wrote:
> DMA aliasing causes interrupt remapping table entries (IRTEs) to be shared
> between multiple device IDs. See commit 3c124435e8dd
> ("iommu/amd: Support multiple PCI DMA aliases in IRQ Remapping") for more
> information on this. However, the AMD IOMMU driver currently invalidates
> IRTE cache entries on a per-device basis whenever an IRTE is updated, not
> for each alias.
>
> This approach leaves stale IRTE cache entries when an IRTE is cached under
> one DMA alias but later updated and invalidated through a different alias.
> In such cases, the original device ID is never invalidated, since it is
> programmed via aliasing.
>
> This incoherency bug has been observed when IRTEs are cached for one
> Non-Transparent Bridge (NTB) DMA alias, later updated via another.
>
> Fix this by invalidating the interrupt remapping table cache for all DMA
> aliases when updating an IRTE.
>
> Link to original thread: https://lore.kernel.org/linux-iommu/20251215114952.190550-2-magnus@dolphinics.com/T/#raf80785292deb22aafe6a817424051ea0d1d28f4
>
> Resending for visibility.
>
> Changes since original thread:
> - Renamed struct pci_dev pointer parameter to unused
> - Rebased on latest master
>
> Cc: Tore H. Larsen <torel@...ula.no>
> Co-developed-by: Lars B. Kristiansen <larsk@...phinics.com>
> Signed-off-by: Lars B. Kristiansen <larsk@...phinics.com>
> Co-developed-by: Jonas Markussen <jonas@...phinics.com>
> Signed-off-by: Jonas Markussen <jonas@...phinics.com>
> Signed-off-by: Magnus Kalland <magnus@...phinics.com>
>
> ---
> drivers/iommu/amd/iommu.c | 30 +++++++++++++++++++++++++++---
> 1 file changed, 27 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
> index 7c12be1b247f..404afcaf4bc1 100644
> --- a/drivers/iommu/amd/iommu.c
> +++ b/drivers/iommu/amd/iommu.c
> @@ -3103,22 +3103,44 @@ const struct iommu_ops amd_iommu_ops = {
> static struct irq_chip amd_ir_chip;
> static DEFINE_SPINLOCK(iommu_table_lock);
>
> +static int iommu_flush_dev_irt(struct pci_dev *unused, u16 devid, void *data)
> +{
> + int ret;
> + struct iommu_cmd cmd;
> + struct amd_iommu *iommu = data;
> +
> + build_inv_irt(&cmd, devid);
> + ret = __iommu_queue_command_sync(iommu, &cmd, true);
> + return ret;
> +}
> +
> static void iommu_flush_irt_and_complete(struct amd_iommu *iommu, u16 devid)
> {
> int ret;
> u64 data;
> + int domain = iommu->pci_seg->id;
> + unsigned int bus = PCI_BUS_NUM(devid);
> + unsigned int devfn = devid & 0xff;
> unsigned long flags;
> struct iommu_cmd cmd, cmd2;
> + struct pci_dev *pdev = NULL;
>
> if (iommu->irtcachedis_enabled)
> return;
>
> - build_inv_irt(&cmd, devid);
> data = atomic64_inc_return(&iommu->cmd_sem_val);
> build_completion_wait(&cmd2, iommu, data);
>
> - raw_spin_lock_irqsave(&iommu->lock, flags);
> - ret = __iommu_queue_command_sync(iommu, &cmd, true);
> + pdev = pci_get_domain_bus_and_slot(domain, bus, devfn);
> + if (pdev) {
> + raw_spin_lock_irqsave(&iommu->lock, flags);
Move the lock above the if () ...
> + ret = pci_for_each_dma_alias(pdev, iommu_flush_dev_irt, iommu);
> + } else {
> + build_inv_irt(&cmd, devid);
> + raw_spin_lock_irqsave(&iommu->lock, flags);
> + ret = __iommu_queue_command_sync(iommu, &cmd, true);
... and call iommu_flush_dev_irt(NULL, devid, iommu) here.
> + }
> +
> if (ret)
> goto out;
> ret = __iommu_queue_command_sync(iommu, &cmd2, false);
> @@ -3127,6 +3149,8 @@ static void iommu_flush_irt_and_complete(struct amd_iommu *iommu, u16 devid)
> wait_on_sem(iommu, data);
> out:
> raw_spin_unlock_irqrestore(&iommu->lock, flags);
> + if (pdev)
> + pci_dev_put(pdev);
This can also be moved to the respective if () branch above, no?
-Joerg
Powered by blists - more mailing lists