[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <YqgbuDbdH2OLcbC7@sol.localdomain>
Date: Mon, 13 Jun 2022 22:25:12 -0700
From: Eric Biggers <ebiggers@...nel.org>
To: Dave Chinner <david@...morbit.com>
Cc: "Darrick J. Wong" <djwong@...nel.org>,
linux-fsdevel@...r.kernel.org, linux-ext4@...r.kernel.org,
linux-f2fs-devel@...ts.sourceforge.net, linux-xfs@...r.kernel.org,
linux-api@...r.kernel.org, linux-fscrypt@...r.kernel.org,
linux-block@...r.kernel.org, linux-kernel@...r.kernel.org,
Keith Busch <kbusch@...nel.org>
Subject: Re: [RFC PATCH v2 1/7] statx: add I/O alignment information
On Fri, May 20, 2022 at 01:27:39PM +1000, Dave Chinner wrote:
> > > * stx_offset_align_optimal: the alignment (in bytes) suggested for file
> > > offsets and I/O segment lengths to get optimal performance. This
> > > applies to both DIO and buffered I/O. It differs from stx_blocksize
> > > in that stx_offset_align_optimal will contain the real optimum I/O
> > > size, which may be a large value. In contrast, for compatibility
> > > reasons stx_blocksize is the minimum size needed to avoid page cache
> > > read/write/modify cycles, which may be much smaller than the optimum
> > > I/O size. For more details about the motivation for this field, see
> > > https://lore.kernel.org/r/20220210040304.GM59729@dread.disaster.area
> >
> > Hmm. So I guess this is supposed to be the filesystem's best guess at
> > the IO size that will minimize RMW cycles in the entire stack? i.e. if
> > the user does not want RMW of pagecache pages, of file allocation units
> > (if COW is enabled), of RAID stripes, or in the storage itself, then it
> > should ensure that all IOs are aligned to this value?
> >
> > I guess that means for XFS it's effectively max(pagesize, i_blocksize,
> > bdev io_opt, sb_width, and (pretend XFS can reflink the realtime volume)
> > the rt extent size)? I didn't see a manpage update for statx(2) but
> > that's mostly what I'm interested in. :)
>
> Yup, xfs_stat_blksize() should give a good idea of what we should
> do. It will end up being pretty much that, except without the need
> to a mount option to turn on the sunit/swidth return, and always
> taking into consideration extent size hints rather than just doing
> that for RT inodes...
While working on the man-pages update, I'm having second thoughts about the
stx_offset_align_optimal field. Does any filesystem other than XFS actually
want stx_offset_align_optimal, when st[x]_blksize already exists? Many network
filesystems, as well as tmpfs when hugepages are enabled, already report large
(megabytes) sizes in st[x]_blksize. And all documentation I looked at (man
pages for Linux, POSIX, FreeBSD, NetBSD, macOS) documents st_blksize as
something like "the preferred blocksize for efficient I/O". It's never
documented as being limited to PAGE_SIZE, which makes sense because it's not.
So stx_offset_align_optimal seems redundant, and it is going to confuse
application developers who will have to decide when to use st[x]_blksize and
when to use stx_offset_align_optimal.
Also, applications that don't work well with huge reported optimal I/O sizes
would still continue to exist, as it will remain possible for applications to
only be tested on filesystems that report a small optimal I/O size.
Perhaps for now we should just add STATX_DIOALIGN instead of STATX_IOALIGN,
leaving out the stx_offset_align_optimal field? What do people think?
- Eric
Powered by blists - more mailing lists