lists.openwall.net | lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening PHC | |
Open Source and information security mailing list archives
| ||
|
Date: Mon, 13 Jun 2022 22:25:12 -0700 From: Eric Biggers <ebiggers@...nel.org> To: Dave Chinner <david@...morbit.com> Cc: "Darrick J. Wong" <djwong@...nel.org>, linux-fsdevel@...r.kernel.org, linux-ext4@...r.kernel.org, linux-f2fs-devel@...ts.sourceforge.net, linux-xfs@...r.kernel.org, linux-api@...r.kernel.org, linux-fscrypt@...r.kernel.org, linux-block@...r.kernel.org, linux-kernel@...r.kernel.org, Keith Busch <kbusch@...nel.org> Subject: Re: [RFC PATCH v2 1/7] statx: add I/O alignment information On Fri, May 20, 2022 at 01:27:39PM +1000, Dave Chinner wrote: > > > * stx_offset_align_optimal: the alignment (in bytes) suggested for file > > > offsets and I/O segment lengths to get optimal performance. This > > > applies to both DIO and buffered I/O. It differs from stx_blocksize > > > in that stx_offset_align_optimal will contain the real optimum I/O > > > size, which may be a large value. In contrast, for compatibility > > > reasons stx_blocksize is the minimum size needed to avoid page cache > > > read/write/modify cycles, which may be much smaller than the optimum > > > I/O size. For more details about the motivation for this field, see > > > https://lore.kernel.org/r/20220210040304.GM59729@dread.disaster.area > > > > Hmm. So I guess this is supposed to be the filesystem's best guess at > > the IO size that will minimize RMW cycles in the entire stack? i.e. if > > the user does not want RMW of pagecache pages, of file allocation units > > (if COW is enabled), of RAID stripes, or in the storage itself, then it > > should ensure that all IOs are aligned to this value? > > > > I guess that means for XFS it's effectively max(pagesize, i_blocksize, > > bdev io_opt, sb_width, and (pretend XFS can reflink the realtime volume) > > the rt extent size)? I didn't see a manpage update for statx(2) but > > that's mostly what I'm interested in. :) > > Yup, xfs_stat_blksize() should give a good idea of what we should > do. It will end up being pretty much that, except without the need > to a mount option to turn on the sunit/swidth return, and always > taking into consideration extent size hints rather than just doing > that for RT inodes... While working on the man-pages update, I'm having second thoughts about the stx_offset_align_optimal field. Does any filesystem other than XFS actually want stx_offset_align_optimal, when st[x]_blksize already exists? Many network filesystems, as well as tmpfs when hugepages are enabled, already report large (megabytes) sizes in st[x]_blksize. And all documentation I looked at (man pages for Linux, POSIX, FreeBSD, NetBSD, macOS) documents st_blksize as something like "the preferred blocksize for efficient I/O". It's never documented as being limited to PAGE_SIZE, which makes sense because it's not. So stx_offset_align_optimal seems redundant, and it is going to confuse application developers who will have to decide when to use st[x]_blksize and when to use stx_offset_align_optimal. Also, applications that don't work well with huge reported optimal I/O sizes would still continue to exist, as it will remain possible for applications to only be tested on filesystems that report a small optimal I/O size. Perhaps for now we should just add STATX_DIOALIGN instead of STATX_IOALIGN, leaving out the stx_offset_align_optimal field? What do people think? - Eric
Powered by blists - more mailing lists