[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ZzXxlf6RWeX3e-3x@localhost.localdomain>
Date: Thu, 14 Nov 2024 20:48:21 +0800
From: Long Li <leo.lilong@...wei.com>
To: John Garry <john.g.garry@...cle.com>, Dave Chinner <david@...morbit.com>
CC: Ritesh Harjani <ritesh.list@...il.com>, <chandan.babu@...cle.com>,
<djwong@...nel.org>, <dchinner@...hat.com>, <hch@....de>,
<viro@...iv.linux.org.uk>, <brauner@...nel.org>, <jack@...e.cz>,
<linux-xfs@...r.kernel.org>, <linux-kernel@...r.kernel.org>,
<linux-fsdevel@...r.kernel.org>, <catherine.hoang@...cle.com>,
<martin.petersen@...cle.com>
Subject: Re: [PATCH v4 00/14] forcealign for xfs
On Wed, Sep 18, 2024 at 11:12:47AM +0100, John Garry wrote:
> On 17/09/2024 23:27, Dave Chinner wrote:
> > > # xfs_bmap -vvp mnt/file
> > > mnt/file:
> > > EXT: FILE-OFFSET BLOCK-RANGE AG AG-OFFSET TOTAL FLAGS
> > > 0: [0..15]: 384..399 0 (384..399) 16 010000
> > > 1: [16..31]: 400..415 0 (400..415) 16 000000
> > > 2: [32..127]: 416..511 0 (416..511) 96 010000
> > > 3: [128..255]: 256..383 0 (256..383) 128 000000
> > > FLAG Values:
> > > 0010000 Unwritten preallocated extent
> > >
> > > Here we have unaligned extents wrt extsize.
> > >
> > > The sub-alloc unit zeroing would solve that - is that what you would still
> > > advocate (to solve that issue)?
> > Yes, I thought that was already implemented for force-align with the
> > DIO code via the extsize zero-around changes in the iomap code. Why
> > isn't that zero-around code ensuring the correct extent layout here?
>
> I just have not included the extsize zero-around changes here. They were
> just grouped with the atomic writes support, as they were added specifically
> for the atomic writes support. Indeed - to me at least - it is strange that
> the DIO code changes are required for XFS forcealign implementation. And,
> even if we use extsize zero-around changes for DIO path, what about buffered
> IO?
I've been reviewing and testing the XFS atomic write patch series. Since
there haven't been any new responses to the previous discussions on this
issue, I'd like to inquire about the buffered IO problem with force-aligned
files, which is a scenario we might encounter.
Consider a case where the file supports force-alignment with a 64K extent size,
and the system page size is 4K. Take the following commands as an example:
xfs_io -c "pwrite 64k 64k" mnt/file
xfs_io -c "pwrite 8k 8k" mnt/file
If unaligned unwritten extents are not permitted, we need to zero out the
sub-allocation units for ranges [0, 8K] and [16K, 64K] to prevent stale
data. While this can be handled relatively easily in direct I/O scenarios,
it presents significant challenges in buffered I/O operations. The main
difficulty arises because the extent size (64K) is larger than the page
size (4K), and our current code base has substantial limitations in handling
such cases.
Any thoughts on this?
Thanks,
Long Li
>
> BTW, I still have concern with this extsize zero-around change which I was
> making:
>
> xfs_iomap_write_unwritten()
> {
> unsigned int rounding;
>
> /* when converting anything unwritten, we must be spanning an alloc unit,
> so round up/down */
> if (rounding > 1) {
> offset_fsb = rounddown(rounding);
> count_fsb = roundup(rounding);
> }
>
> ...
> do {
> xfs_bmapi_write();
> ...
> xfs_trans_commit();
> } while ();
> }
>
> As mentioned elsewhere, it's a bit of a bodge (to do this rounding).
>
> >
> > > > FWIW, I also understand things are different if we are doing 128kB
> > > > atomic writes on 16kB force aligned files. However, in this
> > > > situation we are treating the 128kB atomic IO as eight individual
> > > > 16kB atomic IOs that are physically contiguous.
> > > Yes, if 16kB force aligned, userspace can only issue 16KB atomic writes.
> > Right, but the eventual goal (given the statx parameters) is to be
> > able to do 8x16kB sequential atomic writes as a single 128kB IO, yes?
>
> No, if atomic write unit max is 16KB, then userspace can only issue a single
> 16KB atomic write.
>
> However, some things to consider:
> a. the block layer may merge those 16KB atomic writes
> b. userspace may also merge 16KB atomic writes and issue a larger atomic
> write (if atomic write unit max is > 16KB)
>
> I had been wondering if there is any value in a lib for helping with b.
>
> >
> > > > > > Again, this is different to the traditional RT file behaviour - it
> > > > > > can use unwritten extents for sub-alloc-unit alignment unmaps
> > > > > > because the RT device can align file offset to any physical offset,
> > > > > > and issue unaligned sector sized IO without any restrictions. Forced
> > > > > > alignment does not have this freedom, and when we extend forced
> > > > > > alignment to RT files, it will not have the freedom to use
> > > > > > unwritten extents for sub-alloc-unit unmapping, either.
> > > > > >
> > > > > So how do you think that we should actually implement
> > > > > xfs_itruncate_extents_flags() properly for forcealign? Would it simply be
> > > > > like:
> > > > >
> > > > > --- a/fs/xfs/xfs_inode.c
> > > > > +++ b/fs/xfs/xfs_inode.c
> > > > > @@ -1050,7 +1050,7 @@ xfs_itruncate_extents_flags(
> > > > > WARN_ON_ONCE(first_unmap_block > XFS_MAX_FILEOFF);
> > > > > return 0;
> > > > > }
> > > > > + if (xfs_inode_has_forcealign(ip))
> > > > > + first_unmap_block = xfs_inode_roundup_alloc_unit(ip,
> > > > > first_unmap_block);
> > > > > error = xfs_bunmapi_range(&tp, ip, flags, first_unmap_block,
> > > > Yes, it would be something like that, except it would have to be
> > > > done before first_unmap_block is verified.
> > > >
> > > ok, and are you still of the opinion that this does not apply to rtvol?
> > The rtvol is*not* force-aligned. It -may- have some aligned
> > allocation requirements that are similar (i.e. sb_rextsize > 1 fsb)
> > but it does*not* force-align extents, written or unwritten.
> >
> > The moment we add force-align support to RT files (as is the plan),
> > then the force-aligned inodes on the rtvol will need to behave as
> > force aligned inodes, not "rtvol" inodes.
>
> ok, fine
>
> Thanks,
> John
>
>
>
Powered by blists - more mailing lists