lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250204062605.GB29300@lst.de>
Date: Tue, 4 Feb 2025 07:26:05 +0100
From: Christoph Hellwig <hch@....de>
To: Thorsten Leemhuis <regressions@...mhuis.info>
Cc: Christoph Hellwig <hch@....de>, Bruno Gravato <bgravato@...il.com>,
	Stefan <linux-kernel@...g.de>, Keith Busch <kbusch@...nel.org>,
	bugzilla-daemon@...nel.org, Adrian Huang <ahuang12@...ovo.com>,
	Linux kernel regressions list <regressions@...ts.linux.dev>,
	linux-nvme@...ts.infradead.org, Jens Axboe <axboe@...com>,
	"iommu@...ts.linux.dev" <iommu@...ts.linux.dev>,
	LKML <linux-kernel@...r.kernel.org>,
	Mario Limonciello <mario.limonciello@....com>
Subject: Re: [Bug 219609] File corruptions on SSD in 1st M.2 socket of
 AsRock X600M-STX + Ryzen 8700G

On Fri, Jan 17, 2025 at 11:30:47AM +0100, Thorsten Leemhuis wrote:
> >> Side note: that "PCI-DMA: Using software bounce buffering for IO
> >>>> (SWIOTLB)" message does show up on two other AMD machines I own as
> >> well. One also has a Ryzen 8000, the other one a much older one.

The message will aways show with > 4G of memory.  It only implies swiotlb
is initialized, not that any device actually uses it.

> >> And BTW a few bits of the latest development in the bugzilla ticket
> >> (https://bugzilla.kernel.org/show_bug.cgi?id=219609 ):
> >>
> >> * iommu=pt and amd_iommu=off seems to work around the problem (in
> >> addition to disabling the iommu in the BIOS setup).

iommu_pt calls iommu_set_default_passthrough, which sets
iommu_def_domain_type to IOMMU_DOMAIN_IDENTITY.  I.e. the hardware
IOMMu is left on, but treated as a 1:1 mapping by Linux.

amd_iommu=off sets amd_iommu_disabled, which calls disable_iommus,
which from a quick read disables the hardware IOMMU.

In either case we'll end up using dma-direct instead of dma-iommu.

> > 
> > That suggests the problem is related to the dma-iommu code, and
> > my strong suspect is the swiotlb bounce buffering for untrusted
> > device.  If you feel adventurous, can you try building a kernel
> > where dev_use_swiotlb() in drivers/iommu/dma-iommu.c is hacked
> > to always return false?
> 
> Tried that, did not help: I still get corrupted data.

.. which together with this implies that the problem only happens
when using the dma-iommu code (with or without swiotlb buffering
for unaligned / untrusted data), and does not happen with
dma-direct.

If we assume it also is related to the optimal dma size, which
the original report suggests, the values for that might be
interesting.  For dma-iommu this is:

	PAGE_SIZE << (IOVA_RANGE_CACHE_MAX_SIZE - 1);

where IOVA_RANGE_CACHE_MAX_SIZE is 6, i.e.

	PAGE_SIZE << 5 or 131072 for x86_64.

for dma-direct it falls back to dma_max_mapping_size, which is
SIZE_MAX without swiotlb, or swiotlb_max_mapping_size, which
is a bit complicate due to minimum alignment, but in this case
should evaluate to: 258048, which is almost twice as big.

And all this unfortunately leaves me really confused.  If someone is
interested in playing around with at the risk of data corruption it would
be interesting to hack hardcoded values into dma_opt_mapping_size, e.g.
plug in the 131072 used by dma-iommu while using dma-direct with the
above iommu disable options.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ