lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  PHC 
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Mon, 13 Jun 2022 22:25:12 -0700
From:   Eric Biggers <>
To:     Dave Chinner <>
Cc:     "Darrick J. Wong" <>,,,,,,,,,
        Keith Busch <>
Subject: Re: [RFC PATCH v2 1/7] statx: add I/O alignment information

On Fri, May 20, 2022 at 01:27:39PM +1000, Dave Chinner wrote:
> > > * stx_offset_align_optimal: the alignment (in bytes) suggested for file
> > >   offsets and I/O segment lengths to get optimal performance.  This
> > >   applies to both DIO and buffered I/O.  It differs from stx_blocksize
> > >   in that stx_offset_align_optimal will contain the real optimum I/O
> > >   size, which may be a large value.  In contrast, for compatibility
> > >   reasons stx_blocksize is the minimum size needed to avoid page cache
> > >   read/write/modify cycles, which may be much smaller than the optimum
> > >   I/O size.  For more details about the motivation for this field, see
> > >
> > 
> > Hmm.  So I guess this is supposed to be the filesystem's best guess at
> > the IO size that will minimize RMW cycles in the entire stack?  i.e. if
> > the user does not want RMW of pagecache pages, of file allocation units
> > (if COW is enabled), of RAID stripes, or in the storage itself, then it
> > should ensure that all IOs are aligned to this value?
> > 
> > I guess that means for XFS it's effectively max(pagesize, i_blocksize,
> > bdev io_opt, sb_width, and (pretend XFS can reflink the realtime volume)
> > the rt extent size)?  I didn't see a manpage update for statx(2) but
> > that's mostly what I'm interested in. :)
> Yup, xfs_stat_blksize() should give a good idea of what we should
> do. It will end up being pretty much that, except without the need
> to a mount option to turn on the sunit/swidth return, and always
> taking into consideration extent size hints rather than just doing
> that for RT inodes...

While working on the man-pages update, I'm having second thoughts about the
stx_offset_align_optimal field.  Does any filesystem other than XFS actually
want stx_offset_align_optimal, when st[x]_blksize already exists?  Many network
filesystems, as well as tmpfs when hugepages are enabled, already report large
(megabytes) sizes in st[x]_blksize.  And all documentation I looked at (man
pages for Linux, POSIX, FreeBSD, NetBSD, macOS) documents st_blksize as
something like "the preferred blocksize for efficient I/O".  It's never
documented as being limited to PAGE_SIZE, which makes sense because it's not.

So stx_offset_align_optimal seems redundant, and it is going to confuse
application developers who will have to decide when to use st[x]_blksize and
when to use stx_offset_align_optimal.

Also, applications that don't work well with huge reported optimal I/O sizes
would still continue to exist, as it will remain possible for applications to
only be tested on filesystems that report a small optimal I/O size.

Perhaps for now we should just add STATX_DIOALIGN instead of STATX_IOALIGN,
leaving out the stx_offset_align_optimal field?  What do people think?

- Eric

Powered by blists - more mailing lists