[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Sat, 9 Sep 2023 11:41:11 +0800
From: Zhang Yi <yi.zhang@...wei.com>
To: Zhihao Cheng <chengzhihao1@...wei.com>, <tytso@....edu>,
<jack@...e.com>
CC: <linux-ext4@...r.kernel.org>
Subject: Re: [PATCH v2] ext4: Fix potential data lost in recovering journal
raced with synchronizing fs bdev
Hello!
On 2023/9/8 20:43, Zhihao Cheng wrote:
> JBD2 makes sure journal data is fallen on fs device by sync_blockdev(),
> however, other process could intercept the EIO information from bdev's
> mapping, which leads journal recovering successful even EIO occurs during
> data written back to fs device.
>
> We found this problem in our product, iscsi + multipath is chosen for block
> device of ext4. Unstable network may trigger kpartx to rescan partitions in
> device mapper layer. Detailed process is shown as following:
>
> mount kpartx irq
> jbd2_journal_recover
> do_one_pass
> memcpy(nbh->b_data, obh->b_data) // copy data to fs dev from journal
> mark_buffer_dirty // mark bh dirty
> vfs_read
> generic_file_read_iter // dio
> filemap_write_and_wait_range
> __filemap_fdatawrite_range
> do_writepages
> block_write_full_folio
> submit_bh_wbc
> >> EIO occurs in disk <<
> end_buffer_async_write
> mark_buffer_write_io_error
> mapping_set_error
> set_bit(AS_EIO, &mapping->flags) // set!
> filemap_check_errors
> test_and_clear_bit(AS_EIO, &mapping->flags) // clear!
> err2 = sync_blockdev
> filemap_write_and_wait
> filemap_check_errors
> test_and_clear_bit(AS_EIO, &mapping->flags) // false
> err2 = 0
>
> Filesystem is mounted successfully even data from journal is failed written
> into disk, and ext4 could become corrupted.
>
> Fix it by comparing 'sbi->s_bdev_wb_err' before loading journal and after
> loading journal.
>
> Fetch a reproducer in [Link].
>
> Link: https://bugzilla.kernel.org/show_bug.cgi?id=217888
> Cc: stable@...r.kernel.org
> Signed-off-by: Zhihao Cheng <chengzhihao1@...wei.com>
> Signed-off-by: Zhang Yi <yi.zhang@...wei.com>
> ---
> v1->v2: Checks wb_err from block device only in ext4.
> fs/ext4/super.c | 22 +++++++++++++++-------
> 1 file changed, 15 insertions(+), 7 deletions(-)
>
> diff --git a/fs/ext4/super.c b/fs/ext4/super.c
> index 38217422f938..4dcaad2403be 100644
> --- a/fs/ext4/super.c
> +++ b/fs/ext4/super.c
> @@ -4907,6 +4907,14 @@ static int ext4_load_and_init_journal(struct super_block *sb,
> if (err)
> return err;
>
> + err = errseq_check_and_advance(&sb->s_bdev->bd_inode->i_mapping->wb_err,
> + &sbi->s_bdev_wb_err);
> + if (err) {
> + ext4_msg(sb, KERN_ERR, "Background error %d when loading journal",
> + err);
> + goto out;
> + }
> +
This solution cannot solve the problem, because the journal tail is
still updated in journal_reset() even if we detect the writeback error
and refuse to mount the ext4 filesystem here. So I suppose we have to
check the I/O error by jbd2 module itself like v1 does.
Thanks,
Yi.
> if (ext4_has_feature_64bit(sb) &&
> !jbd2_journal_set_features(EXT4_SB(sb)->s_journal, 0, 0,
> JBD2_FEATURE_INCOMPAT_64BIT)) {
> @@ -5365,6 +5373,13 @@ static int __ext4_fill_super(struct fs_context *fc, struct super_block *sb)
> goto failed_mount3a;
> }
>
> + /*
> + * Save the original bdev mapping's wb_err value which could be
> + * used to detect the metadata async write error.
> + */
> + spin_lock_init(&sbi->s_bdev_wb_lock);
> + errseq_check_and_advance(&sb->s_bdev->bd_inode->i_mapping->wb_err,
> + &sbi->s_bdev_wb_err);
> err = -EINVAL;
> /*
> * The first inode we look at is the journal inode. Don't try
> @@ -5571,13 +5586,6 @@ static int __ext4_fill_super(struct fs_context *fc, struct super_block *sb)
> }
> #endif /* CONFIG_QUOTA */
>
> - /*
> - * Save the original bdev mapping's wb_err value which could be
> - * used to detect the metadata async write error.
> - */
> - spin_lock_init(&sbi->s_bdev_wb_lock);
> - errseq_check_and_advance(&sb->s_bdev->bd_inode->i_mapping->wb_err,
> - &sbi->s_bdev_wb_err);
> EXT4_SB(sb)->s_mount_state |= EXT4_ORPHAN_FS;
> ext4_orphan_cleanup(sb, es);
> EXT4_SB(sb)->s_mount_state &= ~EXT4_ORPHAN_FS;
>
Powered by blists - more mailing lists