linux-kernel - Re: [PATCH v2] iommu/amd: Invalidate IRT cache for DMA aliases

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <26cfa307-6c33-41f9-a7a0-fbf202b38a00@amd.com>
Date: Mon, 9 Feb 2026 15:57:34 +0530
From: "Srivastava, Dheeraj Kumar" <dhsrivas@....com>
To: Magnus Kalland <magnus@...phinics.com>, <joro@...tes.org>
CC: <iommu@...ts.linux.dev>, <jonas@...phinics.com>, <larsk@...phinics.com>,
	<linux-kernel@...r.kernel.org>, <suravee.suthikulpanit@....com>,
	<torel@...ula.no>, <vasant.hegde@....com>
Subject: Re: [PATCH v2] iommu/amd: Invalidate IRT cache for DMA aliases

Hi Magnus,

On 2/5/2026 7:31 PM, Magnus Kalland wrote:
> DMA aliasing causes interrupt remapping table entries (IRTEs) to be shared
> between multiple device IDs. See commit 3c124435e8dd
> ("iommu/amd: Support multiple PCI DMA aliases in IRQ Remapping") for more
> information on this. However, the AMD IOMMU driver currently invalidates
> IRTE cache entries on a per-device basis whenever an IRTE is updated, not
> for each alias.
> 
> This approach leaves stale IRTE cache entries when an IRTE is cached under
> one DMA alias but later updated and invalidated through a different alias.
> In such cases, the original device ID is never invalidated, since it is
> programmed via aliasing.
> 
> This incoherency bug has been observed when IRTEs are cached for one
> Non-Transparent Bridge (NTB) DMA alias, later updated via another.
> 
> Fix this by invalidating the interrupt remapping table cache for all DMA
> aliases when updating an IRTE.
> 
> Link: https://lore.kernel.org/linux-iommu/fwtqfdk3m7qrazj4bfutl4grac46agtxztc3p2lqnejt2wyexu@lztyomxrm3pk/
> Signed-off-by: Magnus Kalland <magnus@...phinics.com>
> 
> ---
> 
> v2:
>   - Move the lock acquire before branching
>   - Call iommu_flush_dev_irt() when pdev is null
>   - Handle pdev refcount in correct branch
> 
>   drivers/iommu/amd/iommu.c | 32 +++++++++++++++++++++++++++-----
>   1 file changed, 27 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
> index 2e1865daa1ce..b5256b28b0c8 100644
> --- a/drivers/iommu/amd/iommu.c
> +++ b/drivers/iommu/amd/iommu.c
> @@ -3077,25 +3077,47 @@ const struct iommu_ops amd_iommu_ops = {
>   static struct irq_chip amd_ir_chip;
>   static DEFINE_SPINLOCK(iommu_table_lock);
>   
> +static int iommu_flush_dev_irt(struct pci_dev *unused, u16 devid, void *data)
> +{
> +	int ret;
> +	struct iommu_cmd cmd;
> +	struct amd_iommu *iommu = data;
> +
> +	build_inv_irt(&cmd, devid);
> +	ret = __iommu_queue_command_sync(iommu, &cmd, true);
> +	return ret;
> +}
> +
>   static void iommu_flush_irt_and_complete(struct amd_iommu *iommu, u16 devid)
>   {
>   	int ret;
>   	u64 data;
> +	int domain = iommu->pci_seg->id;
> +	unsigned int bus = PCI_BUS_NUM(devid);
> +	unsigned int devfn = devid & 0xff;
>   	unsigned long flags;
> -	struct iommu_cmd cmd, cmd2;
> +	struct iommu_cmd cmd;
> +	struct pci_dev *pdev = NULL;
>   
>   	if (iommu->irtcachedis_enabled)
>   		return;
>   
> -	build_inv_irt(&cmd, devid);
>   	data = atomic64_inc_return(&iommu->cmd_sem_val);
> -	build_completion_wait(&cmd2, iommu, data);
> +	build_completion_wait(&cmd, iommu, data);
>   
> +	pdev = pci_get_domain_bus_and_slot(domain, bus, devfn);
>   	raw_spin_lock_irqsave(&iommu->lock, flags);
> -	ret = __iommu_queue_command_sync(iommu, &cmd, true);
> +	if (pdev) {
> +		ret = pci_for_each_dma_alias(pdev, iommu_flush_dev_irt, iommu);
> +		pci_dev_put(pdev);
> +	} else {
> +		ret = iommu_flush_dev_irt(NULL, devid, iommu);
> +	}
> +
>   	if (ret)
>   		goto out;
> -	ret = __iommu_queue_command_sync(iommu, &cmd2, false);
> +
> +	ret = __iommu_queue_command_sync(iommu, &cmd, false);
>   	if (ret)
>   		goto out;
>   	wait_on_sem(iommu, data);


I tested the patch with lockdep (CONFIG_PROVE_LOCKING=y) enabled and 
observed the following lockdep warnings in the kernel logs.

[    7.215360] kernel: =============================
[    7.215360] kernel: [ BUG: Invalid wait context ]
[    7.215360] kernel: 6.19.0-rc8-3e36d27b34eb-1770495763816 #1 Not tainted
[    7.215360] kernel: -----------------------------
[    7.215360] kernel: swapper/0/1 is trying to lock:
[    7.215360] kernel: ff4a3b3365f62368 (&k->list_lock){+.+.}-{3:3}, at: 
bus_to_subsys+0x28/0x90
[    7.215360] kernel: other info that might help us debug this:
[    7.215360] kernel: context-{5:5}
[    7.215360] kernel: 2 locks held by swapper/0/1:
[    7.215360] kernel:  #0: ff4a3ad400055650 
(&desc->request_mutex){+.+.}-{4:4}, at: __setup_irq+0xac/0x770
[    7.215360] kernel:  #1: ff4a3ad4000554c0 
(&irq_desc_lock_class){-...}-{2:2}, at: __setup_irq+0xe7/0x770
[    7.215360] kernel: stack backtrace:
[    7.215360] kernel: CPU: 61 UID: 0 PID: 1 Comm: swapper/0 Not tainted 
6.19.0-rc8-3e36d27b34eb-1770495763816 #1 PREEMPT(voluntary)
[    7.215360] kernel: Hardware name: AMD Corporation 
Titanite_4G/Titanite_4G, BIOS RTI100CC 03/28/2024
[    7.215360] kernel: Call Trace:
[    7.215360] kernel:  <TASK>
[    7.215360] kernel:  dump_stack_lvl+0x78/0xe0
[    7.215360] kernel:  __lock_acquire+0x836/0xbe0
[    7.215360] kernel:  lock_acquire+0xc7/0x2c0
[    7.215360] kernel:  ? bus_to_subsys+0x28/0x90
[    7.215360] kernel:  ? srso_alias_return_thunk+0x5/0xfbef5
[    7.215360] kernel:  ? validate_chain+0x261/0x6e0
[    7.215360] kernel:  ? __pfx_match_pci_dev_by_id+0x10/0x10
[    7.215360] kernel:  _raw_spin_lock+0x34/0x80
[    7.215360] kernel:  ? bus_to_subsys+0x28/0x90
[    7.215360] kernel:  bus_to_subsys+0x28/0x90
[    7.215360] kernel:  bus_find_device+0x30/0xd0
[    7.215360] kernel:  ? lock_acquire+0xc7/0x2c0
[    7.215360] kernel:  pci_get_domain_bus_and_slot+0x7d/0x100
[    7.215360] kernel:  iommu_flush_irt_and_complete+0xaa/0x190
[    7.215360] kernel:  ? srso_alias_return_thunk+0x5/0xfbef5
[    7.215360] kernel:  ? srso_alias_return_thunk+0x5/0xfbef5
[    7.215360] kernel:  ? __modify_irte_ga.isra.0+0x5f/0x80
[    7.215360] kernel:  irq_remapping_activate+0x43/0x80
[    7.215360] kernel:  __irq_domain_activate_irq+0x53/0x90
[    7.215360] kernel:  __irq_domain_activate_irq+0x32/0x90
[    7.215360] kernel:  irq_domain_activate_irq+0x2d/0x50
[    7.215360] kernel:  __setup_irq+0x339/0x770
[    7.215360] kernel:  request_threaded_irq+0xe5/0x190
[    7.215360] kernel:  ? __pfx_acpi_irq+0x10/0x10
[    7.215360] kernel:  ? __pfx_acpi_ev_sci_xrupt_handler+0x10/0x10
[    7.215360] kernel:  acpi_os_install_interrupt_handler+0xaf/0x100
[    7.215360] kernel:  ? __pfx_acpi_init+0x10/0x10
[    7.215360] kernel:  acpi_ev_install_xrupt_handlers+0x22/0x90
[    7.215360] kernel:  ? __pfx_acpi_init+0x10/0x10
[    7.215360] kernel:  acpi_bus_init+0x3a/0x460
[    7.215360] kernel:  ? acpi_ut_release_mutex+0x4a/0x90
[    7.215360] kernel:  ? srso_alias_return_thunk+0x5/0xfbef5
[    7.215360] kernel:  ? 
acpi_install_address_space_handler_internal.part.0+0x64/0x90
[    7.215360] kernel:  ? __pfx_acpi_init+0x10/0x10
[    7.215360] kernel:  ? __pfx_acpi_init+0x10/0x10
[    7.215360] kernel:  acpi_init+0x5d/0x130
[    7.215360] kernel:  ? __pfx_scan_for_dmi_ipmi+0x10/0x10
[    7.215360] kernel:  ? __pfx_acpi_init+0x10/0x10
[    7.215360] kernel:  do_one_initcall+0x5c/0x370
[    7.215360] kernel:  do_initcalls+0xdb/0x190
[    7.215360] kernel:  kernel_init_freeable+0x2d1/0x420
[    7.215360] kernel:  ? __pfx_kernel_init+0x10/0x10
[    7.215360] kernel:  kernel_init+0x1a/0x1c0
[    7.215360] kernel:  ret_from_fork+0x25a/0x280
[    7.215360] kernel:  ? __pfx_kernel_init+0x10/0x10
[    7.215360] kernel:  ret_from_fork_asm+0x1a/0x30
[    7.215360] kernel:  </TASK>

 From the warning trace, it appears that __setup_irq() is already 
holding a raw_spinlock (context 2).

Meanwhile, pci_get_domain_bus_and_slot() attempts to acquire a regular 
spinlock (context 3). This triggers the kernel warning.

Thanks
Dheeraj