[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <e74a7e44-9055-31a7-0124-812a6ae42a41@huawei.com>
Date: Tue, 12 Sep 2023 11:52:37 +0800
From: Zhihao Cheng <chengzhihao1@...wei.com>
To: Jan Kara <jack@...e.cz>, Zhang Yi <yi.zhang@...wei.com>
CC: <tytso@....edu>, <jack@...e.com>, <linux-ext4@...r.kernel.org>
Subject: Re: [PATCH v2] ext4: Fix potential data lost in recovering journal
raced with synchronizing fs bdev
在 2023/9/12 0:18, Jan Kara 写道:
> Hello!
>
> On Sat 09-09-23 11:41:11, Zhang Yi wrote:
>> On 2023/9/8 20:43, Zhihao Cheng wrote:
>>> JBD2 makes sure journal data is fallen on fs device by sync_blockdev(),
>>> however, other process could intercept the EIO information from bdev's
>>> mapping, which leads journal recovering successful even EIO occurs during
>>> data written back to fs device.
>>>
>>> We found this problem in our product, iscsi + multipath is chosen for block
>>> device of ext4. Unstable network may trigger kpartx to rescan partitions in
>>> device mapper layer. Detailed process is shown as following:
>>>
>>> mount kpartx irq
>>> jbd2_journal_recover
>>> do_one_pass
>>> memcpy(nbh->b_data, obh->b_data) // copy data to fs dev from journal
>>> mark_buffer_dirty // mark bh dirty
>>> vfs_read
>>> generic_file_read_iter // dio
>>> filemap_write_and_wait_range
>>> __filemap_fdatawrite_range
>>> do_writepages
>>> block_write_full_folio
>>> submit_bh_wbc
>>> >> EIO occurs in disk <<
>>> end_buffer_async_write
>>> mark_buffer_write_io_error
>>> mapping_set_error
>>> set_bit(AS_EIO, &mapping->flags) // set!
>>> filemap_check_errors
>>> test_and_clear_bit(AS_EIO, &mapping->flags) // clear!
>>> err2 = sync_blockdev
>>> filemap_write_and_wait
>>> filemap_check_errors
>>> test_and_clear_bit(AS_EIO, &mapping->flags) // false
>>> err2 = 0
>>>
>>> Filesystem is mounted successfully even data from journal is failed written
>>> into disk, and ext4 could become corrupted.
>>>
>>> Fix it by comparing 'sbi->s_bdev_wb_err' before loading journal and after
>>> loading journal.
>>>
>>> Fetch a reproducer in [Link].
>>>
>>> Link: https://bugzilla.kernel.org/show_bug.cgi?id=217888
>>> Cc: stable@...r.kernel.org
>>> Signed-off-by: Zhihao Cheng <chengzhihao1@...wei.com>
>>> Signed-off-by: Zhang Yi <yi.zhang@...wei.com>
>>> ---
>>> v1->v2: Checks wb_err from block device only in ext4.
>>> fs/ext4/super.c | 22 +++++++++++++++-------
>>> 1 file changed, 15 insertions(+), 7 deletions(-)
>>>
>>> diff --git a/fs/ext4/super.c b/fs/ext4/super.c
>>> index 38217422f938..4dcaad2403be 100644
>>> --- a/fs/ext4/super.c
>>> +++ b/fs/ext4/super.c
>>> @@ -4907,6 +4907,14 @@ static int ext4_load_and_init_journal(struct super_block *sb,
>>> if (err)
>>> return err;
>>>
>>> + err = errseq_check_and_advance(&sb->s_bdev->bd_inode->i_mapping->wb_err,
>>> + &sbi->s_bdev_wb_err);
>>> + if (err) {
>>> + ext4_msg(sb, KERN_ERR, "Background error %d when loading journal",
>>> + err);
>>> + goto out;
>>> + }
>>> +
>>
>> This solution cannot solve the problem, because the journal tail is
>> still updated in journal_reset() even if we detect the writeback error
>> and refuse to mount the ext4 filesystem here. So I suppose we have to
>> check the I/O error by jbd2 module itself like v1 does.
>
> Hum, that's a good point because next time we will try to mount the fs we
> will not try to replay the journal anymore. So let's return to v1 and I'm
> sorry for misguiding you Zhihao.
>
> But when we are doing background IO error detection in jbd2 during journal
> replay, I'm wondering whether we shouldn't be doing something similar in
> checkpointing code - like when we are about to remove a transaction from
> the journal. And as I'm checking we already do that using
> JBD2_CHECKPOINT_IO_ERROR bit handling - maybe we could replace that with a
> more standard errseq mechanism that is available these days as a cleanup?
> > And the ext4 handling in ext4_check_bdev_write_error() is useful only in
> nojournal mode as otherwise jbd2 is taking care of all writeback errors
> including the background ones. So maybe we can guard the
> ext4_check_bdev_write_error() by a !ext4_handle_valid(handle) check to make
> that obvious (and comment about that). >
> What do you think?
>
> Honza
>
Ok, I will try it. Thanks for suggestions.
Powered by blists - more mailing lists