[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <Z72yUqGWHdRyCE25@li-dc0c254c-257c-11b2-a85c-98b6c1322444.ibm.com>
Date: Tue, 25 Feb 2025 17:36:42 +0530
From: Ojaswin Mujoo <ojaswin@...ux.ibm.com>
To: Baokun Li <libaokun1@...wei.com>
Cc: linux-ext4@...r.kernel.org, "Theodore Ts'o" <tytso@....edu>,
Jan Kara <jack@...e.cz>, linux-kernel@...r.kernel.org,
Mahesh Kumar <maheshkumar657g@...il.com>,
Ritesh Harjani <ritesh.list@...il.com>,
Yang Erkun <yangerkun@...wei.com>
Subject: Re: [PATCH 1/2] ext4: only defer sb update on error if SB_ACTIVE
On Tue, Feb 25, 2025 at 09:53:10AM +0800, Baokun Li wrote:
> On 2025/2/22 16:40, Ojaswin Mujoo wrote:
> > Presently we always BUG_ON if trying to start a transaction on a journal
> > marked with JBD2_UNMOUNT, since this should never happen. However while
> > running stress tests it was observed that in case of some error handling
> > paths, it is possible for update_super_work to start a transaction after
> > the journal is destroyed eg:
> >
> > (umount)
> > ext4_kill_sb
> > kill_block_super
> > generic_shutdown_super
> > sync_filesystem /* commits all txns */
> > evict_inodes
> > /* might start a new txn */
> > ext4_put_super
> > flush_work(&sbi->s_sb_upd_work) /* flush the workqueue */
> > jbd2_journal_destroy
> > journal_kill_thread
> > journal->j_flags |= JBD2_UNMOUNT;
> > jbd2_journal_commit_transaction
> > jbd2_journal_get_descriptor_buffer
> > jbd2_journal_bmap
> > ext4_journal_bmap
> > ext4_map_blocks
> > ...
> > ext4_inode_error
> Just curious, since jbd2_journal_bmap() only queries the map and does not
> create it, how does it fail here? Is there more information in dmesg?
> Is s_journal_inum normal after file system corruption?
Hey Baokun,
So I dug a bit more into the vmcore. The error information in sbi looks
like this:
s_add_error_count = 1,
s_first_error_code = 117,
s_first_error_line = 475,
s_first_error_ino = 0,
s_first_error_block = 0,
s_first_error_func = 0xc0080000055300d0 <__func__.6> "ext4_read_block_bitmap_nowait",
s_first_error_time = 1737023235,
s_last_error_code = 117,
s_last_error_line = 609,
s_last_error_ino = 8,
s_last_error_block = 783,
s_last_error_func = 0xc008000005531b10 <__func__.41> "ext4_map_blocks",
s_last_error_time = 1737023236,
The first error is here:
if ((bitmap_blk <= le32_to_cpu(sbi->s_es->s_first_data_block)) ||
474 (bitmap_blk >= ext4_blocks_count(sbi->s_es))) {
* 475 ext4_error(sb, "Invalid block bitmap block %llu in "
476 "block_group %u", bitmap_blk, block_group);
477 ext4_mark_group_bitmap_corrupted(sb, block_group,
478 EXT4_GROUP_INFO_BBITMAP_CORRUPT);
479 return ERR_PTR(-EFSCORRUPTED);
480 }
and the last error is here:
608 if (retval > 0 && map->m_flags & EXT4_MAP_MAPPED) {
* 609 ret = check_block_validity(inode, map);
610 if (ret != 0)
611 return ret;
612 }
And indeed we have the traces of the first error in dmesg:
[75284.713463] EXT4-fs error (device loop36): ext4_read_block_bitmap_nowait:475: comm proc01: Invalid block bitmap block 0 in block_group 0
[75284.713470] EXT4-fs error (device loop36): ext4_read_block_bitmap_nowait:475: comm proc01: Invalid block bitmap block 0 in block_group 0
[75284.713476] EXT4-fs error (device loop36): ext4_read_block_bitmap_nowait:475: comm proc01: Invalid block bitmap block 0 in block_group 0
However, the last error seems strange. It seems like check_block_validity
should ideally never fail for a journal inode. Unfortunately, sbi->s_es page is
not recorded in the crash dump for some reason so idk the exact value at the
time of the check, but looking in journal->j_inode->i_ino, the inode num is 8,
which seems fine to me. So yeah, I'm a bit unsure what caused the corruption.
I'll look a bit more into the proc01 ltp to see if we can recreate the failure
to get more info.
>
> Thanks,
> Baokun
> > ext4_handle_error
> > schedule_work(&sbi->s_sb_upd_work)
> >
> > /* work queue kicks in */
> > update_super_work
> > jbd2_journal_start
> > start_this_handle
> > BUG_ON(journal->j_flags &
> > JBD2_UNMOUNT)
> >
> > Hence, make sure we only defer the update of ext4 sb if the sb is still
> > active. Otherwise, just fallback to an un-journaled commit.
> >
> > The important thing to note here is that we must only defer sb update if
> > we have not yet flushed the s_sb_update_work queue in umount path else
> > this race can be hit (point 1 below). Since we don't have a direct way
> > to check for that we use SB_ACTIVE instead. The SB_ACTIVE check is a bit
> > subtle so adding some notes below for future reference:
> >
> > 1. Ideally we would want to have a something like (flags & JBD2_UNMOUNT
> > == 0) however this is not correct since we could end up scheduling work
> > after it has been flushed:
> >
> > ext4_put_super
> > flush_work(&sbi->s_sb_upd_work)
> >
> > **kjournald2**
> > jbd2_journal_commit_transaction
> > ...
> > ext4_inode_error
> > /* JBD2_UNMOUNT not set */
> > schedule_work(s_sb_upd_work)
> >
> > jbd2_journal_destroy
> > journal->j_flags |= JBD2_UNMOUNT;
> >
> > **workqueue**
> > update_super_work
> > jbd2_journal_start
> > start_this_handle
> > BUG_ON(JBD2_UNMOUNT)
> >
> > Something like the above doesn't happen with SB_ACTIVE check because we
> > are sure that the workqueue would be flushed at a later point if we are
> > in the umount path.
> >
> > 2. We don't need a similar check in ext4_grp_locked_error since it is
> > only called from mballoc and AFAICT it would be always valid to schedule
> > work here.
> >
> > Fixes: 2d01ddc86606 ("ext4: save error info to sb through journal if available")
> > Reported-by: Mahesh Kumar <maheshkumar657g@...il.com>
> > Suggested-by: Ritesh Harjani <ritesh.list@...il.com>
> > Signed-off-by: Ojaswin Mujoo <ojaswin@...ux.ibm.com>
> > ---
> > fs/ext4/super.c | 2 +-
> > 1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/fs/ext4/super.c b/fs/ext4/super.c
> > index a963ffda692a..b7341e9acf62 100644
> > --- a/fs/ext4/super.c
> > +++ b/fs/ext4/super.c
> > @@ -706,7 +706,7 @@ static void ext4_handle_error(struct super_block *sb, bool force_ro, int error,
> > * constraints, it may not be safe to do it right here so we
> > * defer superblock flushing to a workqueue.
> > */
> > - if (continue_fs && journal)
> > + if (continue_fs && journal && (sb->s_flags & SB_ACTIVE))
> > schedule_work(&EXT4_SB(sb)->s_sb_upd_work);
> > else
> > ext4_commit_super(sb);
>
>
Powered by blists - more mailing lists