[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <yfekmxz7biiuvairgen2pw6laccs4qvblt56uxmqenyckt2pp6@rfagttgqpdfr>
Date: Thu, 27 Nov 2025 13:24:29 +0100
From: Jan Kara <jack@...e.cz>
To: Zhang Yi <yi.zhang@...weicloud.com>
Cc: Ojaswin Mujoo <ojaswin@...ux.ibm.com>, linux-ext4@...r.kernel.org,
linux-fsdevel@...r.kernel.org, linux-kernel@...r.kernel.org, tytso@....edu,
adilger.kernel@...ger.ca, jack@...e.cz, yi.zhang@...wei.com, yizhang089@...il.com,
libaokun1@...wei.com, yangerkun@...wei.com
Subject: Re: [PATCH v2 00/13] ext4: replace ext4_es_insert_extent() when
caching on-disk extents
Hi Yi!
On Mon 24-11-25 13:04:04, Zhang Yi wrote:
> On 11/23/2025 6:55 PM, Ojaswin Mujoo wrote:
> > On Fri, Nov 21, 2025 at 02:07:58PM +0800, Zhang Yi wrote:
> >> Changes since v1:
> >> - Rebase the codes based on the latest linux-next 20251120.
> >> - Add patches 01-05, fix two stale data problems caused by
> >
> > Hi Zhang, thanks for the patches.
> >
>
> Thank you for take time to look at this series.
>
> > I've always felt uncomfortable with the ZEROOUT code here because it
> > seems to have many such bugs as you pointed out in the series. Its very
> > fragile and the bugs are easy to miss behind all the data valid and
> > split flags mess.
> >
>
> Yes, I agree with you. The implementation of EXT4_EXT_MAY_ZEROOUT has
> significantly increased the complexity of split extents and the
> potential for bugs.
Yep, that code is complex and prone to bugs.
> > As per my understanding, ZEROOUT logic seems to be a special best-effort
> > try to make the split/convert operation "work" when dealing with
> > transient errors like ENOSPC etc. I was just wondering if it makes sense
> > to just get rid of the whole ZEROOUT logic completely and just reset the
> > extent to orig state if there is any error. This allows us to get rid of
> > DATA_VALID* flags as well and makes the whole ext4_split_convert_extents()
> > slightly less messy.
> >
> > Maybe we can have a retry loop at the top level caller if we want to try
> > again for say ENOSPC or ENOMEM.
> >
> > Would love to hear your thoughts on it.
>
> I think this is a direction worth exploring. However, what I am
> currently considering is that we need to address this scenario of
> splitting extent during the I/O completion. Although the ZEROOUT logic
> is fragile and has many issues recently, it currently serves as a
> fallback solution for handling ENOSPC errors that arise when splitting
> extents during I/O completion. It ensures that I/O operations do not
> fail due to insufficient extent blocks.
Also partial extent zeroout offers a good performance win when the
portion needing zeroout is small (we can save extent splitting). And I
agree it is a good safety net for ENOSPC issues - otherwise there's no
guarantee page writeback can finish without hitting ENOSPC. We do have
reserved blocks for these cases but the pool is limited so you can still
run out of blocks if you try hard enough.
> Please see ext4_convert_unwritten_extents_endio(). Although we have made
> our best effort to tried to split extents using
> EXT4_GET_BLOCKS_IO_CREATE_EXT before issuing I/Os, we still have not
> covered all scenarios. Moreover, after converting the buffered I/O path
> to the iomap infrastructure in the future, we may need to split extents
> during the I/O completion worker[1].
Yes, this might be worth exploring. The advantage of doing extent splitting
in advance is that on IO submission you have the opportunity of restarting
the transaction on ENOSPC to possibly release some blocks. This is not
easily doable e.g. on writeback completion so the pressure on the pool of
reserved blocks is going to be more common (previously you needed reserved
blocks only when writeback was racing with fallocate or similar, now you
may need them each time you write in the middle of unwritten extent). So I
think the change will need some testing whether it isn't too easy to hit
ENOSPC conditions on IO completion without EXT4_GET_BLOCKS_IO_CREATE_EXT.
But otherwise it's always nice to remove code :)
Honza
--
Jan Kara <jack@...e.com>
SUSE Labs, CR
Powered by blists - more mailing lists