linux-kernel - Re: [RFC 0/5] ext4: Implement support for extsize hints

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <Zu2PEXtKMAUN5CyV@li-bb2b2a4c-3307-11b2-a85c-8fa5c3a69313.ibm.com>
Date: Fri, 20 Sep 2024 20:34:49 +0530
From: Ojaswin Mujoo <ojaswin@...ux.ibm.com>
To: Dave Chinner <david@...morbit.com>
Cc: linux-ext4@...r.kernel.org, "Theodore Ts'o" <tytso@....edu>,
        Ritesh Harjani <ritesh.list@...il.com>, linux-kernel@...r.kernel.org,
        "Darrick J . Wong" <djwong@...nel.org>, linux-fsdevel@...r.kernel.org,
        John Garry <john.g.garry@...cle.com>, dchinner@...hat.com
Subject: Re: [RFC 0/5] ext4: Implement support for extsize hints

On Fri, Sep 20, 2024 at 08:34:14AM +1000, Dave Chinner wrote:
> On Thu, Sep 19, 2024 at 12:43:17PM +0530, Ojaswin Mujoo wrote:
> > On Wed, Sep 18, 2024 at 07:54:27PM +1000, Dave Chinner wrote:
> > > On Wed, Sep 11, 2024 at 02:31:04PM +0530, Ojaswin Mujoo wrote:
> > > Behaviour such as extent size hinting *should* be the same across
> > > all filesystems that provide this functionality.  This makes using
> > > extent size hints much easier for users, admins and application
> > > developers. The last thing I want to hear is application devs tell
> > > me at conferences that "we don't use extent size hints anymore
> > > because ext4..."
> > 
> > Yes, makes sense :)  
> > 
> > Nothing to worry here tho as ext4 also treats the extsize value as a
> > hint exactly like XFS. We have tried to keep the behavior as similar
> > to XFS as possible for the exact reasons you mentioned. 
> 
> It is worth explicitly stating this (i.e. all the behaviours that
> are the same) in the design documentation rather than just the
> corner cases where it is different. It was certainly not clear how
> failures were treated.

Got it Dave, I did mention it in the actual commit 5/5 but I agree. I
will update the cover letter to be more clear about the design in future
revisions.

> 
> > And yes, we do plan to add a forcealign (or similar) feature for ext4 as
> > well for atomic writes which would change the hint to a mandate
> 
> Ok. That should be stated, too.
> 
> FWIW, it would be a good idea to document this all in the kernel
> documentation itself, so there is a guideline for other filesystems
> to implement the same behaviour. e.g. in
> Documentation/filesystems/extent-size-hints.rst

Okay makes sense, I can look into this as a next step.

> 
> > > > 2. eof allocation on XFS trims the blocks allocated beyond eof with extsize
> > > >    hint. That means on XFS for eof allocations (with extsize hint) only logical
> > > >    start gets aligned.
> > > 
> > > I'm not sure I understand what you are saying here. XFS does extsize
> > > alignment of both the start and end of post-eof extents the same as
> > > it does for extents within EOF. For example:
> > > 
> > > # xfs_io -fdc "truncate 0" -c "extsize 16k" -c "pwrite 0 4k" -c "bmap -vvp" foo
> > > wrote 4096/4096 bytes at offset 0
> > > 4 KiB, 1 ops; 0.0308 sec (129.815 KiB/sec and 32.4538 ops/sec)
> > > foo:
> > > EXT: FILE-OFFSET      BLOCK-RANGE      AG AG-OFFSET        TOTAL FLAGS
> > >    0: [0..7]:          256504..256511    0 (256504..256511)     8 000000
> > >    1: [8..31]:         256512..256535    0 (256512..256535)    24 010000
> > >  FLAG Values:
> > >     0100000 Shared extent
> > >     0010000 Unwritten preallocated extent
> > > 
> > > There's a 4k written extent at 0, and a 12k unwritten extent
> > > beyond EOF at 4k. I.e. we have an extent of 16kB as the hint
> > > required that is correctly aligned beyond EOF.
> > > 
> > > If I then write another 4k at 20k (beyond both EOF and the unwritten
> > > extent beyond EOF:
> > > 
> > > # xfs_io -fdc "truncate 0" -c "extsize 16k" -c "pwrite 0 4k" -c "pwrite 20k 4k" -c "bmap -vvp" foo
> > > wrote 4096/4096 bytes at offset 0
> > > 4 KiB, 1 ops; 0.0210 sec (190.195 KiB/sec and 47.5489 ops/sec)
> > > wrote 4096/4096 bytes at offset 20480
> > > 4 KiB, 1 ops; 0.0001 sec (21.701 MiB/sec and 5555.5556 ops/sec)
> > > foo:
> > >  EXT: FILE-OFFSET      BLOCK-RANGE      AG AG-OFFSET        TOTAL FLAGS
> > >    0: [0..7]:          180000..180007    0 (180000..180007)     8 000000
> > >    1: [8..39]:         180008..180039    0 (180008..180039)    32 010000
> > >    2: [40..47]:        180040..180047    0 (180040..180047)     8 000000
> > >    3: [48..63]:        180048..180063    0 (180048..180063)    16 010000
> > >  FLAG Values:
> > >     0100000 Shared extent
> > >     0010000 Unwritten preallocated extent
> > > 
> > > You can see we did contiguous allocation of another 16kB at offset
> > > 16kB, and then wrote to 20k for 4kB.. i.e. the new extent was
> > > correctly aligned at both sides as the extsize hint says it should
> > > be....
> > 
> > Sorry for the confusion Dave. What was meant is that XFS would indeed
> > respect extsize hint for EOF allocations but if we close the file, since
> > we trim the blocks past EOF upon close, we would only see that the
> > lstart is aligned but the end would not.
> 
> Right, but that is desired behaviour, especially when extsize is
> large.  i.e. when the file is closed it is an indication that the
> file will not be written again, so we don't need to keep post-eof
> blocks around for fragmentation prevention reasons.
> 
> Removing post-EOF extents on close prevents large extsize hints from
> consuming lots of unused space on files that are never going to be
> written to again(*).  That's user visible, and because it can cause
> premature ENOSPC, users will report this excessive space usage
> behaviour as a bug (and they are right).  Hence removing post-eof
> extents on file close when extent size hints are in use comes under
> the guise of Good Behaviour To Have.
> 
> (*) think about how much space is wasted if you clone a kernel git
> tree under a 1MB extent size hint directory. All those tiny header
> files now take up 1MB of space on disk....
> 
> Keep in mind that when the file is opened for write again, the
> extent size hint still gets applied to the new extents.  If the
> extending write starts beyond the EOF extsize range, then the new
> extent after the hole at EOF will be fully extsize aligned, as
> expected.
> 
> If the new write is exactly extending the file, then the new extents
> will not be extsize aligned - the start will be at the EOF block,
> and they will be extsize -length-.  IOWs, the extent size is
> maintained, just the logical alignment is not exactly extsize
> aligned. This could be considered a bug, but it's never been an
> issue for anyone because, in XFS, physical extent alignment is
> separate (and maintained regardless of logical alignment) for extent
> size hint based allocations.
> 
> Adding force-align will prevent this behaviour from occurring, as
> post-eof trimming will be done to extsize alignment, not to the EOF
> block.  Hence open/close/open will not affect logical or physical
> alignment of force-align extents (and hence won't affect atomic
> writes).

Thanks for the context, I will try to keep this behavior similar to XFS
once we implement the EOF support for extsize hints in next revision.

Regards,
Ojaswin
> 
> -Dave.
> -- 
> Dave Chinner
> david@...morbit.com