linux-kernel - Re: [PATCH 1/2] ext4: only defer sb update on error if SB

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1a1ff87b-6f75-4d43-ad7b-54883bea13f8@huawei.com>
Date: Wed, 26 Feb 2025 09:55:52 +0800
From: Baokun Li <libaokun1@...wei.com>
To: Ojaswin Mujoo <ojaswin@...ux.ibm.com>
CC: <linux-ext4@...r.kernel.org>, Theodore Ts'o <tytso@....edu>, Jan Kara
	<jack@...e.cz>, <linux-kernel@...r.kernel.org>, Mahesh Kumar
	<maheshkumar657g@...il.com>, Ritesh Harjani <ritesh.list@...il.com>, Yang
 Erkun <yangerkun@...wei.com>
Subject: Re: [PATCH 1/2] ext4: only defer sb update on error if SB_ACTIVE

On 2025/2/25 20:06, Ojaswin Mujoo wrote:
> On Tue, Feb 25, 2025 at 09:53:10AM +0800, Baokun Li wrote:
>> On 2025/2/22 16:40, Ojaswin Mujoo wrote:
>>> Presently we always BUG_ON if trying to start a transaction on a journal
>>> marked with JBD2_UNMOUNT, since this should never happen. However while
>>> running stress tests it was observed that in case of some error handling
>>> paths, it is possible for update_super_work to start a transaction after
>>> the journal is destroyed eg:
>>>
>>> (umount)
>>> ext4_kill_sb
>>>     kill_block_super
>>>       generic_shutdown_super
>>>         sync_filesystem /* commits all txns */
>>>         evict_inodes
>>>           /* might start a new txn */
>>>         ext4_put_super
>>> 	flush_work(&sbi->s_sb_upd_work) /* flush the workqueue */
>>>           jbd2_journal_destroy
>>>             journal_kill_thread
>>>               journal->j_flags |= JBD2_UNMOUNT;
>>>             jbd2_journal_commit_transaction
>>>               jbd2_journal_get_descriptor_buffer
>>>                 jbd2_journal_bmap
>>>                   ext4_journal_bmap
>>>                     ext4_map_blocks
>>>                       ...
>>>                       ext4_inode_error
>> Just curious, since jbd2_journal_bmap() only queries the map and does not
>> create it, how does it fail here? Is there more information in dmesg?
>> Is s_journal_inum normal after file system corruption?
> Hey Baokun,
Hello Ojaswin,

Thanks for your detailed explanation!
>
> So I dug a bit more into the vmcore. The error information in sbi looks
> like this:
>
>    s_add_error_count = 1,
>    s_first_error_code = 117,
>    s_first_error_line = 475,
>    s_first_error_ino = 0,
>    s_first_error_block = 0,
>    s_first_error_func = 0xc0080000055300d0 <__func__.6> "ext4_read_block_bitmap_nowait",
>    s_first_error_time = 1737023235,
>
>    s_last_error_code = 117,
>    s_last_error_line = 609,
>    s_last_error_ino = 8,
>    s_last_error_block = 783,
>    s_last_error_func = 0xc008000005531b10 <__func__.41> "ext4_map_blocks",
>    s_last_error_time = 1737023236,
>
>    The first error is here:
>
>        if ((bitmap_blk <= le32_to_cpu(sbi->s_es->s_first_data_block)) ||
>       474               (bitmap_blk >= ext4_blocks_count(sbi->s_es))) {
>    *  475                   ext4_error(sb, "Invalid block bitmap block %llu in "
>       476                              "block_group %u", bitmap_blk, block_group);
>       477                   ext4_mark_group_bitmap_corrupted(sb, block_group,
>       478                                           EXT4_GROUP_INFO_BBITMAP_CORRUPT);
>       479                   return ERR_PTR(-EFSCORRUPTED);
>       480           }
>
> and the last error is here:
>
>      608           if (retval > 0 && map->m_flags & EXT4_MAP_MAPPED) {
>   *  609                   ret = check_block_validity(inode, map);
>      610                   if (ret != 0)
>      611                           return ret;
>      612           }
>
>
> And indeed we have the traces of the first error in dmesg:
>
> [75284.713463] EXT4-fs error (device loop36): ext4_read_block_bitmap_nowait:475: comm proc01: Invalid block bitmap block 0 in block_group 0
> [75284.713470] EXT4-fs error (device loop36): ext4_read_block_bitmap_nowait:475: comm proc01: Invalid block bitmap block 0 in block_group 0
> [75284.713476] EXT4-fs error (device loop36): ext4_read_block_bitmap_nowait:475: comm proc01: Invalid block bitmap block 0 in block_group 0
>
> However, the last error seems strange. It seems like check_block_validity
> should ideally never fail for a journal inode.Unfortunately, sbi->s_es page is
> not recorded in the crash dump for some reason so idk the exact value at the
> time of the check, but looking in journal->j_inode->i_ino, the inode num is 8,
> which seems fine to me. So yeah, I'm a bit unsure what caused the corruption.
> I'll look a bit more into the proc01 ltp to see if we can recreate the failure
> to get more info.
Right, check_block_validity() skips the journal inode check. If
the journal inode check fails, that shows s_es->s_journal_inum and
journal->j_inode->i_ino are different. The file system doesn't modify
s_journal_inum, so it should be modified by some other writer bypassing
the file system (i.e. writing to bare disk).

If that's how it is, we could avoid this issue by using EXT4_JOURNAL_INO
directly or saving s_journal_inum to ext4_sb_info (which offers better
compatibility).


Cheers,
Baokun
>> Thanks,
>> Baokun
>>>                         ext4_handle_error
>>>                           schedule_work(&sbi->s_sb_upd_work)
>>>
>>>                                                  /* work queue kicks in */
>>>                                                  update_super_work
>>>                                                    jbd2_journal_start
>>>                                                      start_this_handle
>>>                                                        BUG_ON(journal->j_flags &
>>>                                                               JBD2_UNMOUNT)
>>>
>>> Hence, make sure we only defer the update of ext4 sb if the sb is still
>>> active.  Otherwise, just fallback to an un-journaled commit.
>>>
>>> The important thing to note here is that we must only defer sb update if
>>> we have not yet flushed the s_sb_update_work queue in umount path else
>>> this race can be hit (point 1 below). Since we don't have a direct way
>>> to check for that we use SB_ACTIVE instead. The SB_ACTIVE check is a bit
>>> subtle so adding some notes below for future reference:
>>>
>>> 1. Ideally we would want to have a something like (flags & JBD2_UNMOUNT
>>> == 0) however this is not correct since we could end up scheduling work
>>> after it has been flushed:
>>>
>>>    ext4_put_super
>>>     flush_work(&sbi->s_sb_upd_work)
>>>
>>>                              **kjournald2**
>>>                              jbd2_journal_commit_transaction
>>>                              ...
>>>                              ext4_inode_error
>>>                                /* JBD2_UNMOUNT not set */
>>>                                schedule_work(s_sb_upd_work)
>>>
>>>      jbd2_journal_destroy
>>>       journal->j_flags |= JBD2_UNMOUNT;
>>>
>>>                                         **workqueue**
>>>                                         update_super_work
>>>                                          jbd2_journal_start
>>>                                           start_this_handle
>>>                                             BUG_ON(JBD2_UNMOUNT)
>>>
>>> Something like the above doesn't happen with SB_ACTIVE check because we
>>> are sure that the workqueue would be flushed at a later point if we are
>>> in the umount path.
>>>
>>> 2. We don't need a similar check in ext4_grp_locked_error since it is
>>> only called from mballoc and AFAICT it would be always valid to schedule
>>> work here.
>>>
>>> Fixes: 2d01ddc86606 ("ext4: save error info to sb through journal if available")
>>> Reported-by: Mahesh Kumar <maheshkumar657g@...il.com>
>>> Suggested-by: Ritesh Harjani <ritesh.list@...il.com>
>>> Signed-off-by: Ojaswin Mujoo <ojaswin@...ux.ibm.com>
>>> ---
>>>    fs/ext4/super.c | 2 +-
>>>    1 file changed, 1 insertion(+), 1 deletion(-)
>>>
>>> diff --git a/fs/ext4/super.c b/fs/ext4/super.c
>>> index a963ffda692a..b7341e9acf62 100644
>>> --- a/fs/ext4/super.c
>>> +++ b/fs/ext4/super.c
>>> @@ -706,7 +706,7 @@ static void ext4_handle_error(struct super_block *sb, bool force_ro, int error,
>>>    		 * constraints, it may not be safe to do it right here so we
>>>    		 * defer superblock flushing to a workqueue.
>>>    		 */
>>> -		if (continue_fs && journal)
>>> +		if (continue_fs && journal && (sb->s_flags & SB_ACTIVE))
>>>    			schedule_work(&EXT4_SB(sb)->s_sb_upd_work);
>>>    		else
>>>    			ext4_commit_super(sb);
>>