[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <Zu2PEXtKMAUN5CyV@li-bb2b2a4c-3307-11b2-a85c-8fa5c3a69313.ibm.com>
Date: Fri, 20 Sep 2024 20:34:49 +0530
From: Ojaswin Mujoo <ojaswin@...ux.ibm.com>
To: Dave Chinner <david@...morbit.com>
Cc: linux-ext4@...r.kernel.org, "Theodore Ts'o" <tytso@....edu>,
Ritesh Harjani <ritesh.list@...il.com>, linux-kernel@...r.kernel.org,
"Darrick J . Wong" <djwong@...nel.org>, linux-fsdevel@...r.kernel.org,
John Garry <john.g.garry@...cle.com>, dchinner@...hat.com
Subject: Re: [RFC 0/5] ext4: Implement support for extsize hints
On Fri, Sep 20, 2024 at 08:34:14AM +1000, Dave Chinner wrote:
> On Thu, Sep 19, 2024 at 12:43:17PM +0530, Ojaswin Mujoo wrote:
> > On Wed, Sep 18, 2024 at 07:54:27PM +1000, Dave Chinner wrote:
> > > On Wed, Sep 11, 2024 at 02:31:04PM +0530, Ojaswin Mujoo wrote:
> > > Behaviour such as extent size hinting *should* be the same across
> > > all filesystems that provide this functionality. This makes using
> > > extent size hints much easier for users, admins and application
> > > developers. The last thing I want to hear is application devs tell
> > > me at conferences that "we don't use extent size hints anymore
> > > because ext4..."
> >
> > Yes, makes sense :)
> >
> > Nothing to worry here tho as ext4 also treats the extsize value as a
> > hint exactly like XFS. We have tried to keep the behavior as similar
> > to XFS as possible for the exact reasons you mentioned.
>
> It is worth explicitly stating this (i.e. all the behaviours that
> are the same) in the design documentation rather than just the
> corner cases where it is different. It was certainly not clear how
> failures were treated.
Got it Dave, I did mention it in the actual commit 5/5 but I agree. I
will update the cover letter to be more clear about the design in future
revisions.
>
> > And yes, we do plan to add a forcealign (or similar) feature for ext4 as
> > well for atomic writes which would change the hint to a mandate
>
> Ok. That should be stated, too.
>
> FWIW, it would be a good idea to document this all in the kernel
> documentation itself, so there is a guideline for other filesystems
> to implement the same behaviour. e.g. in
> Documentation/filesystems/extent-size-hints.rst
Okay makes sense, I can look into this as a next step.
>
> > > > 2. eof allocation on XFS trims the blocks allocated beyond eof with extsize
> > > > hint. That means on XFS for eof allocations (with extsize hint) only logical
> > > > start gets aligned.
> > >
> > > I'm not sure I understand what you are saying here. XFS does extsize
> > > alignment of both the start and end of post-eof extents the same as
> > > it does for extents within EOF. For example:
> > >
> > > # xfs_io -fdc "truncate 0" -c "extsize 16k" -c "pwrite 0 4k" -c "bmap -vvp" foo
> > > wrote 4096/4096 bytes at offset 0
> > > 4 KiB, 1 ops; 0.0308 sec (129.815 KiB/sec and 32.4538 ops/sec)
> > > foo:
> > > EXT: FILE-OFFSET BLOCK-RANGE AG AG-OFFSET TOTAL FLAGS
> > > 0: [0..7]: 256504..256511 0 (256504..256511) 8 000000
> > > 1: [8..31]: 256512..256535 0 (256512..256535) 24 010000
> > > FLAG Values:
> > > 0100000 Shared extent
> > > 0010000 Unwritten preallocated extent
> > >
> > > There's a 4k written extent at 0, and a 12k unwritten extent
> > > beyond EOF at 4k. I.e. we have an extent of 16kB as the hint
> > > required that is correctly aligned beyond EOF.
> > >
> > > If I then write another 4k at 20k (beyond both EOF and the unwritten
> > > extent beyond EOF:
> > >
> > > # xfs_io -fdc "truncate 0" -c "extsize 16k" -c "pwrite 0 4k" -c "pwrite 20k 4k" -c "bmap -vvp" foo
> > > wrote 4096/4096 bytes at offset 0
> > > 4 KiB, 1 ops; 0.0210 sec (190.195 KiB/sec and 47.5489 ops/sec)
> > > wrote 4096/4096 bytes at offset 20480
> > > 4 KiB, 1 ops; 0.0001 sec (21.701 MiB/sec and 5555.5556 ops/sec)
> > > foo:
> > > EXT: FILE-OFFSET BLOCK-RANGE AG AG-OFFSET TOTAL FLAGS
> > > 0: [0..7]: 180000..180007 0 (180000..180007) 8 000000
> > > 1: [8..39]: 180008..180039 0 (180008..180039) 32 010000
> > > 2: [40..47]: 180040..180047 0 (180040..180047) 8 000000
> > > 3: [48..63]: 180048..180063 0 (180048..180063) 16 010000
> > > FLAG Values:
> > > 0100000 Shared extent
> > > 0010000 Unwritten preallocated extent
> > >
> > > You can see we did contiguous allocation of another 16kB at offset
> > > 16kB, and then wrote to 20k for 4kB.. i.e. the new extent was
> > > correctly aligned at both sides as the extsize hint says it should
> > > be....
> >
> > Sorry for the confusion Dave. What was meant is that XFS would indeed
> > respect extsize hint for EOF allocations but if we close the file, since
> > we trim the blocks past EOF upon close, we would only see that the
> > lstart is aligned but the end would not.
>
> Right, but that is desired behaviour, especially when extsize is
> large. i.e. when the file is closed it is an indication that the
> file will not be written again, so we don't need to keep post-eof
> blocks around for fragmentation prevention reasons.
>
> Removing post-EOF extents on close prevents large extsize hints from
> consuming lots of unused space on files that are never going to be
> written to again(*). That's user visible, and because it can cause
> premature ENOSPC, users will report this excessive space usage
> behaviour as a bug (and they are right). Hence removing post-eof
> extents on file close when extent size hints are in use comes under
> the guise of Good Behaviour To Have.
>
> (*) think about how much space is wasted if you clone a kernel git
> tree under a 1MB extent size hint directory. All those tiny header
> files now take up 1MB of space on disk....
>
> Keep in mind that when the file is opened for write again, the
> extent size hint still gets applied to the new extents. If the
> extending write starts beyond the EOF extsize range, then the new
> extent after the hole at EOF will be fully extsize aligned, as
> expected.
>
> If the new write is exactly extending the file, then the new extents
> will not be extsize aligned - the start will be at the EOF block,
> and they will be extsize -length-. IOWs, the extent size is
> maintained, just the logical alignment is not exactly extsize
> aligned. This could be considered a bug, but it's never been an
> issue for anyone because, in XFS, physical extent alignment is
> separate (and maintained regardless of logical alignment) for extent
> size hint based allocations.
>
> Adding force-align will prevent this behaviour from occurring, as
> post-eof trimming will be done to extsize alignment, not to the EOF
> block. Hence open/close/open will not affect logical or physical
> alignment of force-align extents (and hence won't affect atomic
> writes).
Thanks for the context, I will try to keep this behavior similar to XFS
once we implement the EOF support for extsize hints in next revision.
Regards,
Ojaswin
>
> -Dave.
> --
> Dave Chinner
> david@...morbit.com
Powered by blists - more mailing lists