lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <30ae8fc4-94ff-4467-835e-28b4a4dfcd8f@nvidia.com>
Date: Wed, 10 Dec 2025 02:30:50 +0000
From: Chaitanya Kulkarni <chaitanyak@...dia.com>
To: Sebastian Ott <sebott@...hat.com>
CC: "linux-nvme@...ts.infradead.org" <linux-nvme@...ts.infradead.org>,
	"iommu@...ts.linux.dev" <iommu@...ts.linux.dev>, Robin Murphy
	<robin.murphy@....com>, "linux-block@...r.kernel.org"
	<linux-block@...r.kernel.org>, "linux-kernel@...r.kernel.org"
	<linux-kernel@...r.kernel.org>, "linux-xfs@...r.kernel.org"
	<linux-xfs@...r.kernel.org>, Jens Axboe <axboe@...com>, Christoph Hellwig
	<hch@....de>, Will Deacon <will@...nel.org>, Carlos Maiolino <cem@...nel.org>
Subject: Re: WARNING: drivers/iommu/io-pgtable-arm.c:639

Sebastian,

On 12/9/25 13:05, Sebastian Ott wrote:
> On Tue, 9 Dec 2025, Robin Murphy wrote:
>> On 2025-12-09 11:43 am, Sebastian Ott wrote:
>>>  Hi,
>>>
>>>  got the following warning after a kernel update on Thurstday, 
>>> leading to a
>>>  panic and fs corruption. I didn't capture the first warning but I'm 
>>> pretty
>>>  sure it was the same. It's reproducible but I didn't bisect since it
>>>  borked my fs. The only hint I can give is that v6.18 worked. Is this a
>>>  known issue? Anything I should try?
>>
>> nvme_unmap_data() is attempting to unmap an IOVA that was never 
>> mapped, or has already been unmapped by someone else. That's a usage 
>> bug.
>
> OK, that's what I suspected - thanks for the confirmation!
>
> I did another repro and tried:
>
> good: 44fc84337b6e Merge tag 'arm64-upstream' of 
> git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
> bad:  cc25df3e2e22 Merge tag 'for-6.19/block-20251201' of 
> git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux
>
> I'll start bisecting between these 2 - hoping it doesn't fork up my root
> fs again...
>
> Thanks,
> Sebastian
>
>
Can you see if this fixes your problem ?


==========
WARNING/DISCLOSURE:

These patches may cause system instability or crashes during testing.
Test only on non-production systems with proper backups in place.
==========


 From 0d180e8055e98d91174ba8fdd47ab934a7a88bef Mon Sep 17 00:00:00 2001
From: Chaitanya Kulkarni <ckulkarnilinux@...il.com>
Date: Tue, 9 Dec 2025 01:23:51 -0800
Subject: [PATCH 1/2 COMPILE TESTED ONLY] iommu/io-pgtable-arm: fix size_t signedness bug in
  unmap path

__arm_lpae_unmap() returns size_t but was returning -ENOENT (negative
error code) when encountering an unmapped PTE. Since size_t is unsigned,
-ENOENT (typically -2) becomes a huge positive value (0xFFFFFFFFFFFFFFFE
on 64-bit systems).

This corrupted value propagates through the call chain:
   __arm_lpae_unmap() returns -ENOENT as size_t
   -> arm_lpae_unmap_pages() returns it
   -> __iommu_unmap() adds it to iova address
   -> iommu_pgsize() triggers BUG_ON due to corrupted iova

The corruption causes:
1. IOVA address overflow in __iommu_unmap() loop
2. BUG_ON in iommu_pgsize() from invalid address alignment
3. Kernel panic on ARM64 systems with SMMU

Fix by returning 0 instead of -ENOENT. The WARN_ON already signals
the error condition, and returning 0 (meaning "nothing unmapped")
is the correct semantic for size_t return type. This matches the
behavior of other io-pgtable implementations (io-pgtable-arm-v7s,
io-pgtable-dart) which return 0 on error conditions.

Kernel splat observed:

  ------------[ cut here ]------------
  kernel BUG at drivers/iommu/iommu.c:2464!
  Internal error: Oops - BUG: 00000000f2000800 [#1] SMP
  Modules linked in: xt_CHECKSUM xt_MASQUERADE xt_conntrack ipt_REJECT
   nf_reject_ipv4 xt_tcpudp nft_compat x_tables nft_chain_nat nf_nat
   nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nf_tables nfnetlink bridge
   stp llc nvme_fabrics binfmt_misc nls_iso8859_1 ipmi_ssif arm_smmuv3_pmu
   cdc_subset arm_spe_pmu spi_nor acpi_power_meter acpi_ipmi ipmi_devintf
   cppc_cpufreq ipmi_msghandler sch_fq_codel dm_multipath scsi_dh_rdac
   scsi_dh_emc scsi_dh_alua arm_cspmu_module efi_pstore autofs4 btrfs
   blake2b libblake2b raid10 raid456 async_raid6_recov async_memcpy
   async_pq async_xor async_tx xor xor_neon raid6_pq raid1 raid0 mlx5_ib
   ib_uverbs ib_core cdc_ether usbnet mlx5_core ghash_ce dax_hmem sm4_ce_gcm
   ast cxl_acpi sm4_ce_ccm drm_shmem_helper sm4_ce cxl_port drm_client_lib
   i2c_smbus sm4_ce_cipher mlxfw drm_kms_helper cxl_core sm4 nvme psample
   igb sm3_ce arm_smccc_trng einj drm nvme_core i2c_algo_bit xhci_pci_renesas
   tls i2c_tegra aes_neon_bs aes_neon_blk aes_ce_blk aes_ce_cipher
  CPU: 26 UID: 0 PID: 0 Comm: swapper/26 Tainted: G        W          6.19.0+ #98
  Tainted: [W]=WARN
  Hardware name: NVIDIA GB200 NVL/P3809-BMC, BIOS 02.05.10 20251010
  pstate: 234000c9 (nzCv daIF +PAN -UAO +TCO +DIT -SSBS BTYPE=--)
  pc : iommu_pgsize.isra.0+0xe8/0xf8
  lr : __iommu_unmap+0xe0/0x308
  sp : ffff80008034fca0
  x29: ffff80008034fca0 x28: 000000000000fffe x27: ffffc6e7950e60b0
  x26: 00000000f9740000 x25: ffffc6e794b2cde8 x24: ffffc6e7967916a8
  x23: ffff80008034fdb8 x22: 0000000000030000 x21: ffff000030949220
  x20: 00000000f974fffe x19: fffffffffffffffe x18: 0000000000000000
  x17: 0000000000000000 x16: 0000000000000000 x15: 0000000000000000
  x14: 0000000000000000 x13: 0000000000000000 x12: 0000000000000000
  x11: 000000000000003f x10: 0000000020010000 x9 : 00000000f974fffe
  x8 : 000000000000003f x7 : 0000000000000000 x6 : ffffffffffffffff
  x5 : 0000000000000000 x4 : ffff80008034fd00 x3 : 0000000000020002
  x2 : 00000000f974fffe x1 : 00000000f974fffe x0 : 0000000020010000
  Call trace:
   iommu_pgsize.isra.0+0xe8/0xf8 (P)
   iommu_unmap_fast+0x18/0x40
   __iommu_dma_iova_unlink+0xec/0x2e8
   dma_iova_destroy+0x30/0xa0
   nvme_unmap_data+0x200/0x2e8 [nvme]
   nvme_pci_complete_batch+0x58/0xa8 [nvme]
   nvme_irq+0x98/0xa8 [nvme]
   __handle_irq_event_percpu+0xbc/0x498
   handle_irq_event+0x54/0xe0
   handle_fasteoi_irq+0x12c/0x1c8
   handle_irq_desc+0x54/0x90
   generic_handle_domain_irq+0x24/0x48
   gic_handle_irq+0x200/0x410
   call_on_irq_stack+0x30/0x48
   do_interrupt_handler+0xa8/0xb8
   el1_interrupt+0x4c/0xd0
   el1h_64_irq_handler+0x18/0x38
   el1h_64_irq+0x84/0x88
   cpuidle_enter_state+0x110/0x6a8 (P)
   cpuidle_enter+0x40/0x70
   do_idle+0x264/0x310
   cpu_startup_entry+0x3c/0x50
   secondary_start_kernel+0x13c/0x180
   __secondary_switched+0xc0/0xc8
  Code: d2800009 d280000a d280000b d65f03c0 (d4210000)
  ---[ end trace 0000000000000000 ]---

Fixes: 3318f7b5cefb ("iommu/io-pgtable-arm: Add quirk to quiet WARN_ON()")
Cc: stable@...r.kernel.org
Reported-by: Ankit Agrawal <ankita@...dia.com>
Reported-by: Sebastian Ott <sebott@...hat.com>
Signed-off-by: Chaitanya Kulkarni <kch@...dia.com>
---

=======================================================================
DISCLOSURE: DUE TO LACK OF H/W THIS PATCH IS COMPLETELY UNTESTED AND
BASED SOLELY ON THEORETICAL ANALYSIS. PLEASE REVIEW CAREFULLY.
=======================================================================

---
  drivers/iommu/io-pgtable-arm.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/iommu/io-pgtable-arm.c b/drivers/iommu/io-pgtable-arm.c
index e6626004b323..05d63fe92e43 100644
--- a/drivers/iommu/io-pgtable-arm.c
+++ b/drivers/iommu/io-pgtable-arm.c
@@ -637,7 +637,7 @@ static size_t __arm_lpae_unmap(struct arm_lpae_io_pgtable *data,
  	pte = READ_ONCE(*ptep);
  	if (!pte) {
  		WARN_ON(!(data->iop.cfg.quirks & IO_PGTABLE_QUIRK_NO_WARN));
-		return -ENOENT;
+		return 0;
  	}
  
  	/* If the size matches this level, we're in the right place */
-- 
2.40.0

######################################################################


 From aa540bb77f7d4460c87b0a317df264de748a3b3c Mon Sep 17 00:00:00 2001
From: Chaitanya Kulkarni <ckulkarnilinux@...il.com>
Date: Tue, 9 Dec 2025 17:01:15 -0800
Subject: [PATCH 2/2 COMPILE TESTED ONLY] block: fix partial IOVA mapping cleanup in
  blk_rq_dma_map_iova

When dma_iova_link() fails partway through mapping a request's
scatter-gather list, the function would break out of the loop without
cleaning up the already-mapped portions. This leads to a map/unmap
size mismatch when the request is later completed.

The completion path (via dma_iova_destroy or nvme_unmap_data) attempts
to unmap the full expected size, but only a partial size was actually
mapped. This triggers "unmapped PTE" warnings in the ARM LPAE io-pgtable
code and can cause IOVA address corruption.

Fix by adding an out_unlink error path that calls dma_iova_unlink()
to clean up any partial mapping before returning failure. This ensures
that when an error occurs:
1. All partially-mapped IOVA ranges are properly unmapped
2. The completion path won't attempt to unmap non-existent mappings
3. No map/unmap size mismatch occurs

Fixes: 858299dc6160 ("block: add scatterlist-less DMA mapping helpers")
Signed-off-by: Chaitanya Kulkarni <kch@...dia.com>
---

=======================================================================
DISCLOSURE: DUE TO LACK OF H/W THIS PATCH IS COMPLETELY UNTESTED AND
BASED SOLELY ON THEORETICAL ANALYSIS. PLEASE REVIEW CAREFULLY.
=======================================================================

---
  block/blk-mq-dma.c | 19 ++++++++++++++-----
  1 file changed, 14 insertions(+), 5 deletions(-)

diff --git a/block/blk-mq-dma.c b/block/blk-mq-dma.c
index b6dbc9767596..eb8b5b6b595c 100644
--- a/block/blk-mq-dma.c
+++ b/block/blk-mq-dma.c
@@ -126,17 +126,26 @@ static bool blk_rq_dma_map_iova(struct request *req, struct device *dma_dev,
  		error = dma_iova_link(dma_dev, state, vec->paddr, mapped,
  				vec->len, dir, attrs);
  		if (error)
-			break;
+			goto out_unlink;
  		mapped += vec->len;
  	} while (blk_map_iter_next(req, &iter->iter, vec));
  
  	error = dma_iova_sync(dma_dev, state, 0, mapped);
-	if (error) {
-		iter->status = errno_to_blk_status(error);
-		return false;
-	}
+	if (error)
+		goto out_unlink;
  
  	return true;
+
+out_unlink:
+	/*
+	 * Unlink any partial mapping to avoid unmap mismatch later.
+	 * If we mapped some bytes but not all, we must clean up now
+	 * to prevent attempting to unmap more than was actually mapped.
+	 */
+	if (mapped)
+		dma_iova_unlink(dma_dev, state, 0, mapped, dir, attrs);
+	iter->status = errno_to_blk_status(error);
+	return false;
  }
  
  static inline void blk_rq_map_iter_init(struct request *rq,
-- 
2.40.0


-ck


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ