[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250204062605.GB29300@lst.de>
Date: Tue, 4 Feb 2025 07:26:05 +0100
From: Christoph Hellwig <hch@....de>
To: Thorsten Leemhuis <regressions@...mhuis.info>
Cc: Christoph Hellwig <hch@....de>, Bruno Gravato <bgravato@...il.com>,
Stefan <linux-kernel@...g.de>, Keith Busch <kbusch@...nel.org>,
bugzilla-daemon@...nel.org, Adrian Huang <ahuang12@...ovo.com>,
Linux kernel regressions list <regressions@...ts.linux.dev>,
linux-nvme@...ts.infradead.org, Jens Axboe <axboe@...com>,
"iommu@...ts.linux.dev" <iommu@...ts.linux.dev>,
LKML <linux-kernel@...r.kernel.org>,
Mario Limonciello <mario.limonciello@....com>
Subject: Re: [Bug 219609] File corruptions on SSD in 1st M.2 socket of
AsRock X600M-STX + Ryzen 8700G
On Fri, Jan 17, 2025 at 11:30:47AM +0100, Thorsten Leemhuis wrote:
> >> Side note: that "PCI-DMA: Using software bounce buffering for IO
> >>>> (SWIOTLB)" message does show up on two other AMD machines I own as
> >> well. One also has a Ryzen 8000, the other one a much older one.
The message will aways show with > 4G of memory. It only implies swiotlb
is initialized, not that any device actually uses it.
> >> And BTW a few bits of the latest development in the bugzilla ticket
> >> (https://bugzilla.kernel.org/show_bug.cgi?id=219609 ):
> >>
> >> * iommu=pt and amd_iommu=off seems to work around the problem (in
> >> addition to disabling the iommu in the BIOS setup).
iommu_pt calls iommu_set_default_passthrough, which sets
iommu_def_domain_type to IOMMU_DOMAIN_IDENTITY. I.e. the hardware
IOMMu is left on, but treated as a 1:1 mapping by Linux.
amd_iommu=off sets amd_iommu_disabled, which calls disable_iommus,
which from a quick read disables the hardware IOMMU.
In either case we'll end up using dma-direct instead of dma-iommu.
> >
> > That suggests the problem is related to the dma-iommu code, and
> > my strong suspect is the swiotlb bounce buffering for untrusted
> > device. If you feel adventurous, can you try building a kernel
> > where dev_use_swiotlb() in drivers/iommu/dma-iommu.c is hacked
> > to always return false?
>
> Tried that, did not help: I still get corrupted data.
.. which together with this implies that the problem only happens
when using the dma-iommu code (with or without swiotlb buffering
for unaligned / untrusted data), and does not happen with
dma-direct.
If we assume it also is related to the optimal dma size, which
the original report suggests, the values for that might be
interesting. For dma-iommu this is:
PAGE_SIZE << (IOVA_RANGE_CACHE_MAX_SIZE - 1);
where IOVA_RANGE_CACHE_MAX_SIZE is 6, i.e.
PAGE_SIZE << 5 or 131072 for x86_64.
for dma-direct it falls back to dma_max_mapping_size, which is
SIZE_MAX without swiotlb, or swiotlb_max_mapping_size, which
is a bit complicate due to minimum alignment, but in this case
should evaluate to: 258048, which is almost twice as big.
And all this unfortunately leaves me really confused. If someone is
interested in playing around with at the risk of data corruption it would
be interesting to hack hardcoded values into dma_opt_mapping_size, e.g.
plug in the 131072 used by dma-iommu while using dma-direct with the
above iommu disable options.
Powered by blists - more mailing lists