linux-ext4 - Re: [RFC PATCH v2 1/7] statx: add I/O alignment information

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <YqgbuDbdH2OLcbC7@sol.localdomain>
Date:   Mon, 13 Jun 2022 22:25:12 -0700
From:   Eric Biggers <ebiggers@...nel.org>
To:     Dave Chinner <david@...morbit.com>
Cc:     "Darrick J. Wong" <djwong@...nel.org>,
        linux-fsdevel@...r.kernel.org, linux-ext4@...r.kernel.org,
        linux-f2fs-devel@...ts.sourceforge.net, linux-xfs@...r.kernel.org,
        linux-api@...r.kernel.org, linux-fscrypt@...r.kernel.org,
        linux-block@...r.kernel.org, linux-kernel@...r.kernel.org,
        Keith Busch <kbusch@...nel.org>
Subject: Re: [RFC PATCH v2 1/7] statx: add I/O alignment information

On Fri, May 20, 2022 at 01:27:39PM +1000, Dave Chinner wrote:
> > > * stx_offset_align_optimal: the alignment (in bytes) suggested for file
> > >   offsets and I/O segment lengths to get optimal performance.  This
> > >   applies to both DIO and buffered I/O.  It differs from stx_blocksize
> > >   in that stx_offset_align_optimal will contain the real optimum I/O
> > >   size, which may be a large value.  In contrast, for compatibility
> > >   reasons stx_blocksize is the minimum size needed to avoid page cache
> > >   read/write/modify cycles, which may be much smaller than the optimum
> > >   I/O size.  For more details about the motivation for this field, see
> > >   https://lore.kernel.org/r/20220210040304.GM59729@dread.disaster.area
> > 
> > Hmm.  So I guess this is supposed to be the filesystem's best guess at
> > the IO size that will minimize RMW cycles in the entire stack?  i.e. if
> > the user does not want RMW of pagecache pages, of file allocation units
> > (if COW is enabled), of RAID stripes, or in the storage itself, then it
> > should ensure that all IOs are aligned to this value?
> > 
> > I guess that means for XFS it's effectively max(pagesize, i_blocksize,
> > bdev io_opt, sb_width, and (pretend XFS can reflink the realtime volume)
> > the rt extent size)?  I didn't see a manpage update for statx(2) but
> > that's mostly what I'm interested in. :)
> 
> Yup, xfs_stat_blksize() should give a good idea of what we should
> do. It will end up being pretty much that, except without the need
> to a mount option to turn on the sunit/swidth return, and always
> taking into consideration extent size hints rather than just doing
> that for RT inodes...

While working on the man-pages update, I'm having second thoughts about the
stx_offset_align_optimal field.  Does any filesystem other than XFS actually
want stx_offset_align_optimal, when st[x]_blksize already exists?  Many network
filesystems, as well as tmpfs when hugepages are enabled, already report large
(megabytes) sizes in st[x]_blksize.  And all documentation I looked at (man
pages for Linux, POSIX, FreeBSD, NetBSD, macOS) documents st_blksize as
something like "the preferred blocksize for efficient I/O".  It's never
documented as being limited to PAGE_SIZE, which makes sense because it's not.

So stx_offset_align_optimal seems redundant, and it is going to confuse
application developers who will have to decide when to use st[x]_blksize and
when to use stx_offset_align_optimal.

Also, applications that don't work well with huge reported optimal I/O sizes
would still continue to exist, as it will remain possible for applications to
only be tested on filesystems that report a small optimal I/O size.

Perhaps for now we should just add STATX_DIOALIGN instead of STATX_IOALIGN,
leaving out the stx_offset_align_optimal field?  What do people think?

- Eric