[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <YTrxpXZB9JkuLesZ@B-P7TQMD6M-0146.local>
Date: Fri, 10 Sep 2021 13:48:21 +0800
From: Gao Xiang <hsiangkao@...ux.alibaba.com>
To: tytso@....edu
Cc: JeffleXu <jefflexu@...ux.alibaba.com>,
Eric Whitney <enwlinux@...il.com>, adilger.kernel@...ger.ca,
linux-ext4@...r.kernel.org, joseph.qi@...ux.alibaba.com
Subject: Re: [PATCH v2] ext4: fix reserved space counter leakage
Hi Ted,
On Mon, Sep 06, 2021 at 02:46:16PM +0800, JeffleXu wrote:
>
>
> On 8/25/21 9:38 AM, JeffleXu wrote:
> >
> >
> > On 8/24/21 4:30 AM, Eric Whitney wrote:
> >> * Jeffle Xu <jefflexu@...ux.alibaba.com>:
> >>> When ext4_insert_delayed block receives and recovers from an error from
> >>> ext4_es_insert_delayed_block(), e.g., ENOMEM, it does not release the
> >>> space it has reserved for that block insertion as it should. One effect
> >>> of this bug is that s_dirtyclusters_counter is not decremented and
> >>> remains incorrectly elevated until the file system has been unmounted.
> >>> This can result in premature ENOSPC returns and apparent loss of free
> >>> space.
> >>>
> >>> Another effect of this bug is that
> >>> /sys/fs/ext4/<dev>/delayed_allocation_blocks can remain non-zero even
> >>> after syncfs has been executed on the filesystem.
> >>>
> >>> Besides, add check for s_dirtyclusters_counter when inode is going to be
> >>> evicted and freed. s_dirtyclusters_counter can still keep non-zero until
> >>> inode is written back in .evict_inode(), and thus the check is delayed
> >>> to .destroy_inode().
> >>>
> >>> Fixes: 51865fda28e5 ("ext4: let ext4 maintain extent status tree")
> >>> Cc: <stable@...r.kernel.org>
> >>> Suggested-by: Gao Xiang <hsiangkao@...ux.alibaba.com>
> >>> Signed-off-by: Jeffle Xu <jefflexu@...ux.alibaba.com>
> >>> ---
> >>> changes since v1:
> >>> - improve commit log suggested by Eric Whitney
> >>> - update "Suggested-by" title for Gao Xian, who actually found this bug
> >>> code
> >>> - add check for s_dirtyclusters_counter in .destroy_inode()
> >>> ---
> >>> fs/ext4/inode.c | 5 +++++
> >>> fs/ext4/super.c | 6 ++++++
> >>> 2 files changed, 11 insertions(+)
> >>>
> >>> diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
> >>> index d8de607849df..73daf9443e5e 100644
> >>> --- a/fs/ext4/inode.c
> >>> +++ b/fs/ext4/inode.c
> >>> @@ -1640,6 +1640,7 @@ static int ext4_insert_delayed_block(struct inode *inode, ext4_lblk_t lblk)
> >>> struct ext4_sb_info *sbi = EXT4_SB(inode->i_sb);
> >>> int ret;
> >>> bool allocated = false;
> >>> + bool reserved = false;
> >>>
> >>> /*
> >>> * If the cluster containing lblk is shared with a delayed,
> >>> @@ -1656,6 +1657,7 @@ static int ext4_insert_delayed_block(struct inode *inode, ext4_lblk_t lblk)
> >>> ret = ext4_da_reserve_space(inode);
> >>> if (ret != 0) /* ENOSPC */
> >>> goto errout;
> >>> + reserved = true;
> >>> } else { /* bigalloc */
> >>> if (!ext4_es_scan_clu(inode, &ext4_es_is_delonly, lblk)) {
> >>> if (!ext4_es_scan_clu(inode,
> >>> @@ -1668,6 +1670,7 @@ static int ext4_insert_delayed_block(struct inode *inode, ext4_lblk_t lblk)
> >>> ret = ext4_da_reserve_space(inode);
> >>> if (ret != 0) /* ENOSPC */
> >>> goto errout;
> >>> + reserved = true;
> >>> } else {
> >>> allocated = true;
> >>> }
> >>> @@ -1678,6 +1681,8 @@ static int ext4_insert_delayed_block(struct inode *inode, ext4_lblk_t lblk)
> >>> }
> >>>
> >>> ret = ext4_es_insert_delayed_block(inode, lblk, allocated);
> >>> + if (ret && reserved)
> >>> + ext4_da_release_space(inode, 1);
> >>>
> >>> errout:
> >>> return ret;
> >>> diff --git a/fs/ext4/super.c b/fs/ext4/super.c
> >>> index dfa09a277b56..61bf52b58fca 100644
> >>> --- a/fs/ext4/super.c
> >>> +++ b/fs/ext4/super.c
> >>> @@ -1351,6 +1351,12 @@ static void ext4_destroy_inode(struct inode *inode)
> >>> true);
> >>> dump_stack();
> >>> }
> >>> +
> >>> + if (EXT4_I(inode)->i_reserved_data_blocks)
> >>> + ext4_msg(inode->i_sb, KERN_ERR,
> >>> + "Inode %lu (%p): i_reserved_data_blocks (%u) not cleared!",
> >>> + inode->i_ino, EXT4_I(inode),
> >>> + EXT4_I(inode)->i_reserved_data_blocks);
> >>> }
> >>>
> >>> static void init_once(void *foo)
> >>> --
> >>> 2.27.0
> >>>
> >>
> >> Looks good, passed 4k xfstests-bld regression. Feel free to add:
> >>
> >> Reviewed-by: Eric Whitney <enwlinux@...il.com>
> >
> >
> > Hi tytso, it's a bug fix and it would be great if it could be merged to
> > 5.15.
> >
>
> ping ...
It seems somewhat a real issue and has some impact in our production.
Would you mind giving some hints if it could be resolved in this way?
Or if some other concerns / solution about this?
Thanks,
Gao Xiang
>
> --
> Thanks,
> Jeffle
Powered by blists - more mailing lists