linux-kernel - RE: [PATCH v4 5/5] iommu/dma: Force swiotlb_max_mapping

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID:
 <SN6PR02MB4157CD26C1D9BAC64208D9D8D45A2@SN6PR02MB4157.namprd02.prod.outlook.com>
Date: Mon, 26 Feb 2024 21:11:22 +0000
From: Michael Kelley <mhklinux@...look.com>
To: Robin Murphy <robin.murphy@....com>, Will Deacon <will@...nel.org>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
CC: "kernel-team@...roid.com" <kernel-team@...roid.com>,
	"iommu@...ts.linux.dev" <iommu@...ts.linux.dev>, Christoph Hellwig
	<hch@....de>, Marek Szyprowski <m.szyprowski@...sung.com>, Petr Tesarik
	<petr.tesarik1@...wei-partners.com>, Dexuan Cui <decui@...rosoft.com>,
	Nicolin Chen <nicolinc@...dia.com>
Subject: RE: [PATCH v4 5/5] iommu/dma: Force swiotlb_max_mapping_size on an
 untrusted device

From: Robin Murphy <robin.murphy@....com> Sent: Monday, February 26, 2024 11:36 AM
> 
> On 21/02/2024 11:35 am, Will Deacon wrote:
> > From: Nicolin Chen <nicolinc@...dia.com>
> >
> > The swiotlb does not support a mapping size > swiotlb_max_mapping_size().
> > On the other hand, with a 64KB PAGE_SIZE configuration, it's observed that
> > an NVME device can map a size between 300KB~512KB, which certainly failed
> > the swiotlb mappings, though the default pool of swiotlb has many slots:
> >      systemd[1]: Started Journal Service.
> >   => nvme 0000:00:01.0: swiotlb buffer is full (sz: 327680 bytes), total 32768 (slots), used 32 (slots)
> >      note: journal-offline[392] exited with irqs disabled
> >      note: journal-offline[392] exited with preempt_count 1
> >
> > Call trace:
> > [    3.099918]  swiotlb_tbl_map_single+0x214/0x240
> > [    3.099921]  iommu_dma_map_page+0x218/0x328
> > [    3.099928]  dma_map_page_attrs+0x2e8/0x3a0
> > [    3.101985]  nvme_prep_rq.part.0+0x408/0x878 [nvme]
> > [    3.102308]  nvme_queue_rqs+0xc0/0x300 [nvme]
> > [    3.102313]  blk_mq_flush_plug_list.part.0+0x57c/0x600
> > [    3.102321]  blk_add_rq_to_plug+0x180/0x2a0
> > [    3.102323]  blk_mq_submit_bio+0x4c8/0x6b8
> > [    3.103463]  __submit_bio+0x44/0x220
> > [    3.103468]  submit_bio_noacct_nocheck+0x2b8/0x360
> > [    3.103470]  submit_bio_noacct+0x180/0x6c8
> > [    3.103471]  submit_bio+0x34/0x130
> > [    3.103473]  ext4_bio_write_folio+0x5a4/0x8c8
> > [    3.104766]  mpage_submit_folio+0xa0/0x100
> > [    3.104769]  mpage_map_and_submit_buffers+0x1a4/0x400
> > [    3.104771]  ext4_do_writepages+0x6a0/0xd78
> > [    3.105615]  ext4_writepages+0x80/0x118
> > [    3.105616]  do_writepages+0x90/0x1e8
> > [    3.105619]  filemap_fdatawrite_wbc+0x94/0xe0
> > [    3.105622]  __filemap_fdatawrite_range+0x68/0xb8
> > [    3.106656]  file_write_and_wait_range+0x84/0x120
> > [    3.106658]  ext4_sync_file+0x7c/0x4c0
> > [    3.106660]  vfs_fsync_range+0x3c/0xa8
> > [    3.106663]  do_fsync+0x44/0xc0
> >
> > Since untrusted devices might go down the swiotlb pathway with dma-iommu,
> > these devices should not map a size larger than swiotlb_max_mapping_size.
> >
> > To fix this bug, add iommu_dma_max_mapping_size() for untrusted devices to
> > take into account swiotlb_max_mapping_size() v.s. iova_rcache_range() from
> > the iommu_dma_opt_mapping_size().
> 
> On the basis that this is at least far closer to correct than doing nothing,
> 
> Acked-by: Robin Murphy <robin.murphy@....com>
> 
> TBH I'm scared to think about theoretical correctness for all the
> interactions between the IOVA granule and min_align_mask, since just the
> SWIOTLB stuff is bad enough, even before you realise the ways that the
> IOVA allocation isn't necessarily right either. However I reckon as long
> as we don't ever see a granule smaller than IO_TLB_SIZE, and/or a
> min_align_mask larger than a granule, then this should probably work
> well enough as-is.
> 

I'm not convinced.  The conditions you describe are reasonable
and reflect upstream code today. But there can still be a failure
due to attempting to allocate a "too large" swiotlb buffer. It
happens with a granule of 64K and min_align_mask of 4K - 1 (the
NVMe case) when the offset passed to iommu_dma_map_page()
is non-zero modulo 4K.  With this patch, the size passed into
iommu_dma_map_page() is limited to 252K, but it gets rounded
up to 256K.  Then swiotlb_tbl_map_single() adds the offset
modulo 4K.  The result exceeds the 256K limit in swiotlb and
the mapping fails.

The case I describe is a reasonable case that happens in the real
world.  As is, this patch will work most of the time because for
large xfers the offset is typically at least 4K aligned. But not always.

Michael