linux-ext4 - Re: [PATCHv4 5/8] iomap: simplify direct io validity check

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20251103181031.GI1735@sol>
Date: Mon, 3 Nov 2025 10:10:31 -0800
From: Eric Biggers <ebiggers@...nel.org>
To: Christoph Hellwig <hch@....de>
Cc: Carlos Llamas <cmllamas@...gle.com>, Keith Busch <kbusch@...nel.org>,
	Keith Busch <kbusch@...a.com>, linux-block@...r.kernel.org,
	linux-fsdevel@...r.kernel.org, linux-xfs@...r.kernel.org,
	linux-ext4@...r.kernel.org, axboe@...nel.dk,
	Hannes Reinecke <hare@...e.de>,
	"Martin K. Petersen" <martin.petersen@...cle.com>
Subject: Re: [PATCHv4 5/8] iomap: simplify direct io validity check

On Fri, Oct 31, 2025 at 10:18:20AM +0100, Christoph Hellwig wrote:
> On Thu, Oct 30, 2025 at 10:40:15AM -0700, Eric Biggers wrote:
> > Allowing DIO segments to be aligned (in memory address and/or length) to
> > less than crypto_data_unit_size on encrypted files has been attempted
> > and discussed before.  Read the cover letter of
> > https://lore.kernel.org/linux-fscrypt/20220128233940.79464-1-ebiggers@kernel.org/
> 
> Hmm, where does "First, it
> necessarily causes it to be possible that crypto data units span bvecs.
> Splits cannot occur at such locations; however the block layer currently
> assumes that bios can be split at any bvec boundary.? come from?  The
> block layer splits at arbitrary boundaries that don't need any kind of
> bvec alignment.

While splits in general can occur on any logical_block_size boundary,
the last I checked if things are set up properly the splitting is much
better behaved than that in practice.

For example, if max_segment_size is a multiple of logical_block_size
size but not the crypto_data_unit_size being used, then that would of
course cause incorrect splits.  However, that's not an issue if the
driver sets max_segment_size to be a multiple of its largest supported
crypto_data_unit_size.  For example, the UFS driver defaults to
max_segment_size=SZ_256K, which is nicely aligned.

I'm sure there are other examples, related to dm devices,
virt_boundary_mask, and so on.  But the point is that they don't seem to
have been applicable to anyone actually using
crypto_data_unit_size > logical_block_size yet.

In contrast, allowing DIO with memory alignment < crypto_data_unit_size
makes the current code trivially start generating incorrect splits due
to the max_segments based splitting.

I'm not sure the current situation of the block layer not really paying
attention to crypto_data_unit_size is sustainable.  But when someone
looked into making the "only split on crypto_data_unit_size" boundaries
guarantee explicit
(https://lkml.kernel.org/linux-block/20210707052943.3960-1-satyaprateek2357@gmail.com/),
it turned out to be a lot of work.  And no one could find a real-world
example where it actually mattered, besides DIO with memory alignment <
crypto_data_unit_size which didn't seem that useful in the first place.
So that's why we're still in the current situation.

> > We eventually decided to proceed with DIO support without it, since it
> > would have added a lot of complexity.  It would have made the bio
> > splitting code in the block layer split bios at boundaries where the
> > length isn't aligned to crypto_data_unit_size, it would have caused a
> > lot of trouble for blk-crypto-fallback, and it even would have been
> > incompatible with some of the hardware drivers (e.g. ufs-exynos.c).
> 
> Ok, if hardware drivers can't handle it that's a good argument.  I can
> see why handling it in the software case is very annoying, but non-stupid
> hardware should not be affected.  Stupid me assuming UFS might not be
> dead stupid of course.
> 
> > It also didn't seem to be all that useful, and it would have introduced
> > edge cases that don't get tested much.  All reachable to unprivileged
> > userspace code too, of course.
> 
> xfstests just started exercising this and we're getting lots of interesting
> reports (for the non-fscrypt case).

Great to hear that it's starting to be tested.  But it's concerning that
it's just happening now, 3 years after the patches went in, and is also
still finding lots of bugs.  It's hard for me to understand how it was
ready, or even useful, in the first place.

- Eric