[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <104a8c8fedffd1ff8a2890983e2ec1c26bff6810.camel@linux.ibm.com>
Date: Fri, 03 Nov 2023 16:13:03 +0100
From: Niklas Schnelle <schnelle@...ux.ibm.com>
To: Christoph Hellwig <hch@....de>,
Bjorn Helgaas <bhelgaas@...gle.com>,
Marek Szyprowski <m.szyprowski@...sung.com>,
Robin Murphy <robin.murphy@....com>,
Petr Tesarik <petr.tesarik1@...wei-partners.com>,
Ross Lagerwall <ross.lagerwall@...rix.com>
Cc: linux-pci <linux-pci@...r.kernel.org>,
linux-kernel@...r.kernel.org, iommu@...ts.linux.dev,
Matthew Rosato <mjrosato@...ux.ibm.com>,
Halil Pasic <pasic@...ux.ibm.com>
Subject: Memory corruption with CONFIG_SWIOTLB_DYNAMIC=y
Hi Swiotlb Maintainers,
With s390 slated to use dma-iommu.c I was experimenting with
CONFIG_SWIOTLB_DYNAMIC but was getting errors all over the place.
Debugging this I've managed to narrow things down to what I believe is
a memory corruption issue caused by overrunning the entire transient
memory pool allocated by swiotlb. In my test this happens on the
iommu_dma_map_page(), swiotlb_tbl_map_single() path when handling
untrusted PCI devices.
I've seen this happen only for transient pools when:
* allocation size >=PAGE_SIZE and,
* the original address of the mapping is not page aligned.
The overrun can be seen clearly by adding a simple "tlb_addr +
alloc_size > pool->end' overrun check to swiotlb_tbl_map_single() and
forcing PCI test devices to be untrusted. With that in place while an
NVMe fio work load runs fine TCP/IP tests on a Mellanox ConnectX-4 VF
reliably trigger the overrun check and with a debug print produce
output like the following:
software IO TLB:
transient:1
index:1
dma_get_min_align_mask(dev):0
orig_addr:ac0caebe
tlb_addr=a4d0f800
start:a4d0f000
end:a4d10000
pool_size:4096
alloc_size:4096
offset:0
The complete code used for this test is available on my
git.kernel.org[0] repository but it's bascially v6.6 + iommu/next (for
s390 DMA API) + 2 trivial debug commits.
For further analysis I've worked closely with Halil Pasic.
The reason why we think this is happening seems twofold. Under a
certain set of circumstances in the function swiotlb_find_slots():
1) the allocated transient pool can not fit the required bounce buffer,
and
2) the invocation of swiotlb_pool_find_slots() finds "a suitable
slot" even though it should fail.
The reason for 2), i.e. swiotlb_pool_find_slots() thinking there is a
suitable bounce buffer in the pool is that for transient pools pool-
>slots[i].list end up nonsensical, because swiotlb_init_io_tlb_pool(),
among other places in swiotlb, assumes that the nslabs of the pool is a
multiple of IO_TLB_SEGSIZE
The reason for 1) is a bit more convoluted and not entirely understood
by us. We are certain though that the function swiotlb_find_slots()
allocates a pool with nr_slots(alloc_size), where this alloc_size is
the alloc_size from swiotlb_tbl_map_single() + swiotlb_align_offset(),
but for alignment reasons some slots may get "skipped over" in
swiotlb_area_find_slots() causing the picked slots to overrun the
allocation.
Not sure how to properly fix this as the different alignment
requirements get pretty complex quickly. So would appreciate your
input.
Thanks,
Niklas
[0] bounce-min branch of my git.kernel.org repository at
https://git.kernel.org/pub/scm/linux/kernel/git/niks/linux.git
Powered by blists - more mailing lists