[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAK896s4=o9cFFnh0KzhbXSSjWiDFoTqNx0ATzGNH8rxj19+1aw@mail.gmail.com>
Date: Wed, 2 Feb 2022 21:40:28 +0800
From: Xin Yin <yinxin.x@...edance.com>
To: Ritesh Harjani <riteshh@...ux.ibm.com>
Cc: harshadshirwadkar@...il.com, tytso@....edu,
adilger.kernel@...ger.ca, linux-ext4@...r.kernel.org,
linux-kernel@...r.kernel.org
Subject: Re: [External] Re: [PATCH 1/2] ext4: use ext4_ext_remove_space() for
fast commit replay delete range
On Wed, Feb 2, 2022 at 4:34 AM Ritesh Harjani <riteshh@...ux.ibm.com> wrote:
>
> Hello Xin,
>
> Sorry about revisiting this thread so late :(
> Recently when I was working on one of the fast_commit issue, I got interested
> in looking into some of those recent fast_commit fixes.
>
> Hence some of these queries.
>
> On 21/12/23 11:23AM, Xin Yin wrote:
> > For now ,we use ext4_punch_hole() during fast commit replay delete range
> > procedure. But it will be affected by inode->i_size, which may not
> > correct during fast commit replay procedure. The following test will
> > failed.
> >
> > -create & write foo (len 1000K)
> > -falloc FALLOC_FL_ZERO_RANGE foo (range 400K - 600K)
> > -create & fsync bar
> ^^^^ do you mean "fsync foo" or is this actually a new file create and fsync
> bar?
bar is a new created file, it is the brother file of foo , it would be
like this.
./foo ./bar
>
>
> > -falloc FALLOC_FL_PUNCH_HOLE foo (range 300K-500K)
> > -fsync foo
> > -crash before a full commit
> >
> > After the fast_commit reply procedure, the range 400K-500K will not be
> > removed. Because in this case, when calling ext4_punch_hole() the
> > inode->i_size is 0, and it just retruns with doing nothing.
>
> I tried looking into this, but I am not able to put my head around that when
> will the inode->i_size will be 0?
>
> So, what I think should happen is when you are doing falocate/fsync foo in your
> above list of operations then, anyways the inode i_disksize will be updated
> using ext4_mark_inode_dirty() and during replay phase inode->i_size will hold
> the right value no?
yes, the inode->i_size hold the right value and ext4_fc_replay_inode()
will update inode to the final state, but during replay phase
ext4_fc_replay_inode() usually is the last step, so before this the
inode->i_size may not correct.
>
> Could you please help understand when, where and how will inode->i_size will be
> 0?
I didn't check why inode->i_size is 0, in this case. I just think
inode->i_size should not affect the behavior of the replay phase.
Another case is inode->i_size may not include unwritten blocks , and
if a file has unwritten blocks at bottom, we can not use
ext4_punch_hole() to remove the unwritten blocks beyond i_size during
the replay phase.
>
> Also - it would be helpful if you have some easy reproducer of this issue you
> mentioned.
The attached test code can reproduce this issue, hope it helps.
>
> -ritesh
>
> >
> > Change to use ext4_ext_remove_space() instead of ext4_punch_hole()
> > to remove blocks of inode directly.
> >
> > Signed-off-by: Xin Yin <yinxin.x@...edance.com>
> > ---
> > fs/ext4/fast_commit.c | 13 ++++++++-----
> > 1 file changed, 8 insertions(+), 5 deletions(-)
> >
> > diff --git a/fs/ext4/fast_commit.c b/fs/ext4/fast_commit.c
> > index aa05b23f9c14..3deb97b22ca4 100644
> > --- a/fs/ext4/fast_commit.c
> > +++ b/fs/ext4/fast_commit.c
> > @@ -1708,11 +1708,14 @@ ext4_fc_replay_del_range(struct super_block *sb, struct ext4_fc_tl *tl,
> > }
> > }
> >
> > - ret = ext4_punch_hole(inode,
> > - le32_to_cpu(lrange.fc_lblk) << sb->s_blocksize_bits,
> > - le32_to_cpu(lrange.fc_len) << sb->s_blocksize_bits);
> > - if (ret)
> > - jbd_debug(1, "ext4_punch_hole returned %d", ret);
> > + down_write(&EXT4_I(inode)->i_data_sem);
> > + ret = ext4_ext_remove_space(inode, lrange.fc_lblk,
> > + lrange.fc_lblk + lrange.fc_len - 1);
> > + up_write(&EXT4_I(inode)->i_data_sem);
> > + if (ret) {
> > + iput(inode);
> > + return 0;
> > + }
> > ext4_ext_replay_shrink_inode(inode,
> > i_size_read(inode) >> sb->s_blocksize_bits);
> > ext4_mark_inode_dirty(NULL, inode);
> > --
> > 2.20.1
> >
View attachment "del_range_issue.c" of type "text/x-c-code" (1889 bytes)
Powered by blists - more mailing lists