lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20240215142208.GA753@willie-the-truck>
Date: Thu, 15 Feb 2024 14:22:09 +0000
From: Will Deacon <will@...nel.org>
To: Nicolin Chen <nicolinc@...dia.com>
Cc: sagi@...mberg.me, hch@....de, axboe@...nel.dk, kbusch@...nel.org,
	joro@...tes.org, robin.murphy@....com, jgg@...dia.com,
	linux-nvme@...ts.infradead.org, linux-kernel@...r.kernel.org,
	iommu@...ts.linux.dev, murphyt7@....ie, baolu.lu@...ux.intel.com
Subject: Re: [PATCH v1 0/2] nvme-pci: Fix dma-iommu mapping failures when
 PAGE_SIZE=64KB

On Wed, Feb 14, 2024 at 11:57:32AM -0800, Nicolin Chen wrote:
> On Wed, Feb 14, 2024 at 04:41:38PM +0000, Will Deacon wrote:
> > On Tue, Feb 13, 2024 at 01:53:55PM -0800, Nicolin Chen wrote:
> > > It's observed that an NVME device is causing timeouts when Ubuntu boots
> > > with a kernel configured with PAGE_SIZE=64KB due to failures in swiotlb:
> > >     systemd[1]: Started Journal Service.
> > >  => nvme 0000:00:01.0: swiotlb buffer is full (sz: 327680 bytes), total 32768 (slots), used 32 (slots)
> > >     note: journal-offline[392] exited with irqs disabled
> > >     note: journal-offline[392] exited with preempt_count 1
> > >
> > > An NVME device under a PCIe bus can be behind an IOMMU, so dma mappings
> > > going through dma-iommu might be also redirected to swiotlb allocations.
> > > Similar to dma_direct_max_mapping_size(), dma-iommu should implement its
> > > dma_map_ops->max_mapping_size to return swiotlb_max_mapping_size() too.
> > >
> > > Though an iommu_dma_max_mapping_size() is a must, it alone can't fix the
> > > issue. The swiotlb_max_mapping_size() returns 252KB, calculated from the
> > > default pool 256KB subtracted by min_align_mask NVME_CTRL_PAGE_SIZE=4KB,
> > > while dma-iommu can roundup a 252KB mapping to 256KB at its "alloc_size"
> > > when PAGE_SIZE=64KB via iova->granule that is often set to PAGE_SIZE. So
> > > this mismatch between NVME_CTRL_PAGE_SIZE=4KB and PAGE_SIZE=64KB results
> > > in a similar failure, though its signature has a fixed size "256KB":
> > >     systemd[1]: Started Journal Service.
> > >  => nvme 0000:00:01.0: swiotlb buffer is full (sz: 262144 bytes), total 32768 (slots), used 128 (slots)
> > >     note: journal-offline[392] exited with irqs disabled
> > >     note: journal-offline[392] exited with preempt_count 1
> > >
> > > Both failures above occur to NVME behind IOMMU when PAGE_SIZE=64KB. They
> > > were likely introduced for the security feature by:
> > > commit 82612d66d51d ("iommu: Allow the dma-iommu api to use bounce buffers"),
> > >
> > > So, this series bundles two fixes together against that. They should be
> > > taken at the same time to entirely fix the mapping failures.
> > 
> > It's a bit of a shot in the dark, but I've got a pending fix to some of
> > the alignment handling in swiotlb. It would be interesting to know if
> > patch 1 has any impact at all on your NVME allocations:
> > 
> > https://lore.kernel.org/r/20240205190127.20685-1-will@kernel.org
> 
> I applied these three patches locally for a test.

Thank you!

> Though I am building with a v6.6 kernel, I see some warnings:
>                  from kernel/dma/swiotlb.c:26:
> kernel/dma/swiotlb.c: In function ‘swiotlb_area_find_slots’:
> ./include/linux/minmax.h:21:35: warning: comparison of distinct pointer types lacks a cast
>    21 |         (!!(sizeof((typeof(x) *)1 == (typeof(y) *)1)))
>       |                                   ^~
> ./include/linux/minmax.h:27:18: note: in expansion of macro ‘__typecheck’
>    27 |                 (__typecheck(x, y) && __no_side_effects(x, y))
>       |                  ^~~~~~~~~~~
> ./include/linux/minmax.h:37:31: note: in expansion of macro ‘__safe_cmp’
>    37 |         __builtin_choose_expr(__safe_cmp(x, y), \
>       |                               ^~~~~~~~~~
> ./include/linux/minmax.h:75:25: note: in expansion of macro ‘__careful_cmp’
>    75 | #define max(x, y)       __careful_cmp(x, y, >)
>       |                         ^~~~~~~~~~~~~
> kernel/dma/swiotlb.c:1007:26: note: in expansion of macro ‘max’
>  1007 |                 stride = max(stride, PAGE_SHIFT - IO_TLB_SHIFT + 1);
>       |                          ^~~
> 
> Replacing with a max_t() can fix these.

Weird, I haven't seen that. I can fix it as you suggest, but please can
you also share your .config so I can look into it further?

> And it seems to get worse, as even a 64KB mapping is failing:
> [    0.239821] nvme 0000:00:01.0: swiotlb buffer is full (sz: 65536 bytes), total 32768 (slots), used 0 (slots)
> 
> With a printk, I found the iotlb_align_mask isn't correct:
>    swiotlb_area_find_slots:alloc_align_mask 0xffff, iotlb_align_mask 0x800
> 
> But fixing the iotlb_align_mask to 0x7ff still fails the 64KB
> mapping..

Hmm. A mask of 0x7ff doesn't make a lot of sense given that the slabs
are 2KiB aligned. I'll try plugging in some of the constants you have
here, as something definitely isn't right...

Will

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ