linux-ext4 - Re: [PATCHv4 5/8] iomap: simplify direct io validity check

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20251030174015.GC1624@sol>
Date: Thu, 30 Oct 2025 10:40:15 -0700
From: Eric Biggers <ebiggers@...nel.org>
To: Christoph Hellwig <hch@....de>
Cc: Carlos Llamas <cmllamas@...gle.com>, Keith Busch <kbusch@...nel.org>,
	Keith Busch <kbusch@...a.com>, linux-block@...r.kernel.org,
	linux-fsdevel@...r.kernel.org, linux-xfs@...r.kernel.org,
	linux-ext4@...r.kernel.org, axboe@...nel.dk,
	Hannes Reinecke <hare@...e.de>,
	"Martin K. Petersen" <martin.petersen@...cle.com>
Subject: Re: [PATCHv4 5/8] iomap: simplify direct io validity check

On Wed, Oct 29, 2025 at 08:06:18AM +0100, Christoph Hellwig wrote:
> I think we need to take a step back and talk about what alignment
> we're talking about here, as there are two dimensions to it.
> 
> The first dimension is: disk alignment vs memory alignment.
> 
> Disk alignment:
>   Direct I/O obviously needs to be aligned to on-disk sectors to have
>   a chance to work, as that is the lowest possible granularity of access.
> 
>   For fіle systems that write out of place we also need to align writes
>   to the logical block size of the file system.
> 
>   With blk-crypto we need to align to the DUN if it is larger than the
>   disk-sector dize.
> 
> Memory alignment:
> 
>   This is the alignment of the buffer in-memory.  Hardware only really
>   cares about this when DMA engines discard the lowest bits, so a typical
>   hardware alignment requirement is to only require a dword (4 byte)
>   alignment.   For drivers that process the payload in software such
>   low alignment have a tendency to cause bugs as they're not written
>   thinking about it.  Similarly for any additional processing like
>   encryption, parity or checksums.
> 
> The second dimension is for the entire operation vs individual vectors,
> this has implications both for the disk and memory alignment.  Keith
> has done work there recently to relax the alignment of the vectors to
> only require the memory alignment, so that preadv/pwritev-like calls
> can have lots of unaligned segments.
> 
> I think it's the latter that's tripping up here now.  Hard coding these
> checks in the file systems seem like a bad idea, we really need to
> advertise them in the queue limits, which is complicated by the fact that
> we only want to do that for bios using block layer encryption. i.e., we
> probably need a separate queue limit that mirrors dma_alignment, but only
> for encrypted bios, and which is taken into account in the block layer
> splitting and communicated up by file systems only for encrypted bios.
> For blk-crypto-fallback we'd need DUN alignment so that the algorithms
> just work (assuming the crypto API can't scatter over misaligned
> segments), but for hardware blk-crypto I suspect that the normal DMA
> engine rules apply, and we don't need to restrict alignment.

Allowing DIO segments to be aligned (in memory address and/or length) to
less than crypto_data_unit_size on encrypted files has been attempted
and discussed before.  Read the cover letter of
https://lore.kernel.org/linux-fscrypt/20220128233940.79464-1-ebiggers@kernel.org/

We eventually decided to proceed with DIO support without it, since it
would have added a lot of complexity.  It would have made the bio
splitting code in the block layer split bios at boundaries where the
length isn't aligned to crypto_data_unit_size, it would have caused a
lot of trouble for blk-crypto-fallback, and it even would have been
incompatible with some of the hardware drivers (e.g. ufs-exynos.c).

It also didn't seem to be all that useful, and it would have introduced
edge cases that don't get tested much.  All reachable to unprivileged
userspace code too, of course.

I can't say that the idea seems all that great to me.

We can always reconsider and still add support for this.  But it's not
clear to me what's changed.

- Eric