[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <2fcc9d30-42e8-4382-bbbc-a3f66016f368@nvidia.com>
Date: Wed, 10 Dec 2025 04:59:50 +0000
From: Chaitanya Kulkarni <chaitanyak@...dia.com>
To: Keith Busch <kbusch@...nel.org>, Sebastian Ott <sebott@...hat.com>
CC: "linux-nvme@...ts.infradead.org" <linux-nvme@...ts.infradead.org>,
"iommu@...ts.linux.dev" <iommu@...ts.linux.dev>, Robin Murphy
<robin.murphy@....com>, "linux-block@...r.kernel.org"
<linux-block@...r.kernel.org>, "linux-kernel@...r.kernel.org"
<linux-kernel@...r.kernel.org>, "linux-xfs@...r.kernel.org"
<linux-xfs@...r.kernel.org>, Jens Axboe <axboe@...com>, Christoph Hellwig
<hch@....de>, Will Deacon <will@...nel.org>, Carlos Maiolino
<cem@...nel.org>, Leon Romanovsky <leon@...nel.org>
Subject: Re: WARNING: drivers/iommu/io-pgtable-arm.c:639
(+ Leon Romanovsky)
On 12/9/25 20:05, Keith Busch wrote:
> On Wed, Dec 10, 2025 at 02:30:50AM +0000, Chaitanya Kulkarni wrote:
>> @@ -126,17 +126,26 @@ static bool blk_rq_dma_map_iova(struct request *req, struct device *dma_dev,
>> error = dma_iova_link(dma_dev, state, vec->paddr, mapped,
>> vec->len, dir, attrs);
>> if (error)
>> - break;
>> + goto out_unlink;
>> mapped += vec->len;
>> } while (blk_map_iter_next(req, &iter->iter, vec));
>>
>> error = dma_iova_sync(dma_dev, state, 0, mapped);
>> - if (error) {
>> - iter->status = errno_to_blk_status(error);
>> - return false;
>> - }
>> + if (error)
>> + goto out_unlink;
>>
>> return true;
>> +
>> +out_unlink:
>> + /*
>> + * Unlink any partial mapping to avoid unmap mismatch later.
>> + * If we mapped some bytes but not all, we must clean up now
>> + * to prevent attempting to unmap more than was actually mapped.
>> + */
>> + if (mapped)
>> + dma_iova_unlink(dma_dev, state, 0, mapped, dir, attrs);
>> + iter->status = errno_to_blk_status(error);
>> + return false;
>> }
> It does look like a bug to continue on when dma_iova_link() fails as the
> caller thinks the entire mapping was successful, but I think you also
> need to call dma_iova_free() to undo the earlier dma_iova_try_alloc(),
> otherwise iova space is leaked.
Thanks for catching that, see updated version of this patch [1].
> I'm a bit doubtful this error condition was hit though: this sequence
> is largely the same as it was in v6.18 before the regression. The only
> difference since then should just be for handling P2P DMA across a host
> bridge, which I don't think applies to the reported bug since that's a
> pretty unusual thing to do.
That's why I've asked reporter to test it.
Either way, IMO both of the patches are still needed.
-ck
[1]
From 726687876a334cb699247584102e491e98f8fdc4 Mon Sep 17 00:00:00 2001
From: Chaitanya Kulkarni <ckulkarnilinux@...il.com>
Date: Tue, 9 Dec 2025 17:01:15 -0800
Subject: [PATCH 2/2] block: fix partial IOVA mapping cleanup in
blk_rq_dma_map_iova
When dma_iova_link() fails partway through mapping a request's
scatter-gather list, the function would break out of the loop without
cleaning up the already-mapped portions. This leads to a map/unmap
size mismatch when the request is later completed.
The completion path (via dma_iova_destroy or nvme_unmap_data) attempts
to unmap the full expected size, but only a partial size was actually
mapped. This triggers "unmapped PTE" warnings in the ARM LPAE io-pgtable
code and can cause IOVA address corruption.
Fix by adding an out_unlink error path that calls dma_iova_unlink()
to clean up any partial mapping before returning failure. This ensures
that when an error occurs:
1. All partially-mapped IOVA ranges are properly unmapped
2. The completion path won't attempt to unmap non-existent mappings
3. No map/unmap size mismatch occurs
Signed-off-by: Chaitanya Kulkarni <ckulkarnilinux@...il.com>
---
block/blk-mq-dma.c | 21 ++++++++++++++++-----
1 file changed, 16 insertions(+), 5 deletions(-)
diff --git a/block/blk-mq-dma.c b/block/blk-mq-dma.c
index b6dbc9767596..ecfd53ed6984 100644
--- a/block/blk-mq-dma.c
+++ b/block/blk-mq-dma.c
@@ -126,17 +126,28 @@ static bool blk_rq_dma_map_iova(struct request *req, struct device *dma_dev,
error = dma_iova_link(dma_dev, state, vec->paddr, mapped,
vec->len, dir, attrs);
if (error)
- break;
+ goto out_unlink;
mapped += vec->len;
} while (blk_map_iter_next(req, &iter->iter, vec));
error = dma_iova_sync(dma_dev, state, 0, mapped);
- if (error) {
- iter->status = errno_to_blk_status(error);
- return false;
- }
+ if (error)
+ goto out_unlink;
return true;
+
+out_unlink:
+ /*
+ * Clean up partial mapping and free the entire IOVA reservation.
+ * dma_iova_unlink() detaches any linked bytes, dma_iova_free()
+ * returns the full IOVA window allocated by dma_iova_try_alloc()
+ * (state->__size tracks the original allocation size).
+ */
+ if (mapped)
+ dma_iova_unlink(dma_dev, state, 0, mapped, dir, attrs);
+ dma_iova_free(dma_dev, state);
+ iter->status = errno_to_blk_status(error);
+ return false;
}
static inline void blk_rq_map_iter_init(struct request *rq,
--
2.40.0
Powered by blists - more mailing lists