[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <26cfa307-6c33-41f9-a7a0-fbf202b38a00@amd.com>
Date: Mon, 9 Feb 2026 15:57:34 +0530
From: "Srivastava, Dheeraj Kumar" <dhsrivas@....com>
To: Magnus Kalland <magnus@...phinics.com>, <joro@...tes.org>
CC: <iommu@...ts.linux.dev>, <jonas@...phinics.com>, <larsk@...phinics.com>,
<linux-kernel@...r.kernel.org>, <suravee.suthikulpanit@....com>,
<torel@...ula.no>, <vasant.hegde@....com>
Subject: Re: [PATCH v2] iommu/amd: Invalidate IRT cache for DMA aliases
Hi Magnus,
On 2/5/2026 7:31 PM, Magnus Kalland wrote:
> DMA aliasing causes interrupt remapping table entries (IRTEs) to be shared
> between multiple device IDs. See commit 3c124435e8dd
> ("iommu/amd: Support multiple PCI DMA aliases in IRQ Remapping") for more
> information on this. However, the AMD IOMMU driver currently invalidates
> IRTE cache entries on a per-device basis whenever an IRTE is updated, not
> for each alias.
>
> This approach leaves stale IRTE cache entries when an IRTE is cached under
> one DMA alias but later updated and invalidated through a different alias.
> In such cases, the original device ID is never invalidated, since it is
> programmed via aliasing.
>
> This incoherency bug has been observed when IRTEs are cached for one
> Non-Transparent Bridge (NTB) DMA alias, later updated via another.
>
> Fix this by invalidating the interrupt remapping table cache for all DMA
> aliases when updating an IRTE.
>
> Link: https://lore.kernel.org/linux-iommu/fwtqfdk3m7qrazj4bfutl4grac46agtxztc3p2lqnejt2wyexu@lztyomxrm3pk/
> Signed-off-by: Magnus Kalland <magnus@...phinics.com>
>
> ---
>
> v2:
> - Move the lock acquire before branching
> - Call iommu_flush_dev_irt() when pdev is null
> - Handle pdev refcount in correct branch
>
> drivers/iommu/amd/iommu.c | 32 +++++++++++++++++++++++++++-----
> 1 file changed, 27 insertions(+), 5 deletions(-)
>
> diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
> index 2e1865daa1ce..b5256b28b0c8 100644
> --- a/drivers/iommu/amd/iommu.c
> +++ b/drivers/iommu/amd/iommu.c
> @@ -3077,25 +3077,47 @@ const struct iommu_ops amd_iommu_ops = {
> static struct irq_chip amd_ir_chip;
> static DEFINE_SPINLOCK(iommu_table_lock);
>
> +static int iommu_flush_dev_irt(struct pci_dev *unused, u16 devid, void *data)
> +{
> + int ret;
> + struct iommu_cmd cmd;
> + struct amd_iommu *iommu = data;
> +
> + build_inv_irt(&cmd, devid);
> + ret = __iommu_queue_command_sync(iommu, &cmd, true);
> + return ret;
> +}
> +
> static void iommu_flush_irt_and_complete(struct amd_iommu *iommu, u16 devid)
> {
> int ret;
> u64 data;
> + int domain = iommu->pci_seg->id;
> + unsigned int bus = PCI_BUS_NUM(devid);
> + unsigned int devfn = devid & 0xff;
> unsigned long flags;
> - struct iommu_cmd cmd, cmd2;
> + struct iommu_cmd cmd;
> + struct pci_dev *pdev = NULL;
>
> if (iommu->irtcachedis_enabled)
> return;
>
> - build_inv_irt(&cmd, devid);
> data = atomic64_inc_return(&iommu->cmd_sem_val);
> - build_completion_wait(&cmd2, iommu, data);
> + build_completion_wait(&cmd, iommu, data);
>
> + pdev = pci_get_domain_bus_and_slot(domain, bus, devfn);
> raw_spin_lock_irqsave(&iommu->lock, flags);
> - ret = __iommu_queue_command_sync(iommu, &cmd, true);
> + if (pdev) {
> + ret = pci_for_each_dma_alias(pdev, iommu_flush_dev_irt, iommu);
> + pci_dev_put(pdev);
> + } else {
> + ret = iommu_flush_dev_irt(NULL, devid, iommu);
> + }
> +
> if (ret)
> goto out;
> - ret = __iommu_queue_command_sync(iommu, &cmd2, false);
> +
> + ret = __iommu_queue_command_sync(iommu, &cmd, false);
> if (ret)
> goto out;
> wait_on_sem(iommu, data);
I tested the patch with lockdep (CONFIG_PROVE_LOCKING=y) enabled and
observed the following lockdep warnings in the kernel logs.
[ 7.215360] kernel: =============================
[ 7.215360] kernel: [ BUG: Invalid wait context ]
[ 7.215360] kernel: 6.19.0-rc8-3e36d27b34eb-1770495763816 #1 Not tainted
[ 7.215360] kernel: -----------------------------
[ 7.215360] kernel: swapper/0/1 is trying to lock:
[ 7.215360] kernel: ff4a3b3365f62368 (&k->list_lock){+.+.}-{3:3}, at:
bus_to_subsys+0x28/0x90
[ 7.215360] kernel: other info that might help us debug this:
[ 7.215360] kernel: context-{5:5}
[ 7.215360] kernel: 2 locks held by swapper/0/1:
[ 7.215360] kernel: #0: ff4a3ad400055650
(&desc->request_mutex){+.+.}-{4:4}, at: __setup_irq+0xac/0x770
[ 7.215360] kernel: #1: ff4a3ad4000554c0
(&irq_desc_lock_class){-...}-{2:2}, at: __setup_irq+0xe7/0x770
[ 7.215360] kernel: stack backtrace:
[ 7.215360] kernel: CPU: 61 UID: 0 PID: 1 Comm: swapper/0 Not tainted
6.19.0-rc8-3e36d27b34eb-1770495763816 #1 PREEMPT(voluntary)
[ 7.215360] kernel: Hardware name: AMD Corporation
Titanite_4G/Titanite_4G, BIOS RTI100CC 03/28/2024
[ 7.215360] kernel: Call Trace:
[ 7.215360] kernel: <TASK>
[ 7.215360] kernel: dump_stack_lvl+0x78/0xe0
[ 7.215360] kernel: __lock_acquire+0x836/0xbe0
[ 7.215360] kernel: lock_acquire+0xc7/0x2c0
[ 7.215360] kernel: ? bus_to_subsys+0x28/0x90
[ 7.215360] kernel: ? srso_alias_return_thunk+0x5/0xfbef5
[ 7.215360] kernel: ? validate_chain+0x261/0x6e0
[ 7.215360] kernel: ? __pfx_match_pci_dev_by_id+0x10/0x10
[ 7.215360] kernel: _raw_spin_lock+0x34/0x80
[ 7.215360] kernel: ? bus_to_subsys+0x28/0x90
[ 7.215360] kernel: bus_to_subsys+0x28/0x90
[ 7.215360] kernel: bus_find_device+0x30/0xd0
[ 7.215360] kernel: ? lock_acquire+0xc7/0x2c0
[ 7.215360] kernel: pci_get_domain_bus_and_slot+0x7d/0x100
[ 7.215360] kernel: iommu_flush_irt_and_complete+0xaa/0x190
[ 7.215360] kernel: ? srso_alias_return_thunk+0x5/0xfbef5
[ 7.215360] kernel: ? srso_alias_return_thunk+0x5/0xfbef5
[ 7.215360] kernel: ? __modify_irte_ga.isra.0+0x5f/0x80
[ 7.215360] kernel: irq_remapping_activate+0x43/0x80
[ 7.215360] kernel: __irq_domain_activate_irq+0x53/0x90
[ 7.215360] kernel: __irq_domain_activate_irq+0x32/0x90
[ 7.215360] kernel: irq_domain_activate_irq+0x2d/0x50
[ 7.215360] kernel: __setup_irq+0x339/0x770
[ 7.215360] kernel: request_threaded_irq+0xe5/0x190
[ 7.215360] kernel: ? __pfx_acpi_irq+0x10/0x10
[ 7.215360] kernel: ? __pfx_acpi_ev_sci_xrupt_handler+0x10/0x10
[ 7.215360] kernel: acpi_os_install_interrupt_handler+0xaf/0x100
[ 7.215360] kernel: ? __pfx_acpi_init+0x10/0x10
[ 7.215360] kernel: acpi_ev_install_xrupt_handlers+0x22/0x90
[ 7.215360] kernel: ? __pfx_acpi_init+0x10/0x10
[ 7.215360] kernel: acpi_bus_init+0x3a/0x460
[ 7.215360] kernel: ? acpi_ut_release_mutex+0x4a/0x90
[ 7.215360] kernel: ? srso_alias_return_thunk+0x5/0xfbef5
[ 7.215360] kernel: ?
acpi_install_address_space_handler_internal.part.0+0x64/0x90
[ 7.215360] kernel: ? __pfx_acpi_init+0x10/0x10
[ 7.215360] kernel: ? __pfx_acpi_init+0x10/0x10
[ 7.215360] kernel: acpi_init+0x5d/0x130
[ 7.215360] kernel: ? __pfx_scan_for_dmi_ipmi+0x10/0x10
[ 7.215360] kernel: ? __pfx_acpi_init+0x10/0x10
[ 7.215360] kernel: do_one_initcall+0x5c/0x370
[ 7.215360] kernel: do_initcalls+0xdb/0x190
[ 7.215360] kernel: kernel_init_freeable+0x2d1/0x420
[ 7.215360] kernel: ? __pfx_kernel_init+0x10/0x10
[ 7.215360] kernel: kernel_init+0x1a/0x1c0
[ 7.215360] kernel: ret_from_fork+0x25a/0x280
[ 7.215360] kernel: ? __pfx_kernel_init+0x10/0x10
[ 7.215360] kernel: ret_from_fork_asm+0x1a/0x30
[ 7.215360] kernel: </TASK>
From the warning trace, it appears that __setup_irq() is already
holding a raw_spinlock (context 2).
Meanwhile, pci_get_domain_bus_and_slot() attempts to acquire a regular
spinlock (context 3). This triggers the kernel warning.
Thanks
Dheeraj
Powered by blists - more mailing lists