[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <c0c8619d-3d9b-a184-3cd1-0cd88447fdcd@huawei.com>
Date: Fri, 23 Jul 2021 21:11:08 +0800
From: yangerkun <yangerkun@...wei.com>
To: Theodore Ts'o <tytso@....edu>
CC: <jack@...e.cz>, <linux-ext4@...r.kernel.org>, <yukuai3@...wei.com>
Subject: Re: [PATCH] ext4: flush s_error_work before journal destroy in
ext4_fill_super
在 2021/7/23 20:11, Theodore Ts'o 写道:
> On Tue, Jul 20, 2021 at 02:24:09PM +0800, yangerkun wrote:
>> 'commit c92dc856848f ("ext4: defer saving error info from atomic
>> context")' and '2d01ddc86606 ("ext4: save error info to sb through journal
>> if available")' add s_error_work to fix checksum error problem. But the
>> error path in ext4_fill_super can lead the follow BUG_ON.
>
> Can you share with me your test case? Your patch will result in the
> shrinker potentially not getting released in some error paths (which
> will cause other kernel panics), and in any case, once the journal is
> destroyed here:
>
>> @@ -5173,15 +5173,15 @@ static int ext4_fill_super(struct super_block *sb, void *data, int silent)
>>
>> ext4_xattr_destroy_cache(sbi->s_ea_block_cache);
>> sbi->s_ea_block_cache = NULL;
>> +failed_mount3a:
>> + ext4_es_unregister_shrinker(sbi);
>> +failed_mount3:
>> + flush_work(&sbi->s_error_work);
>>
>> if (sbi->s_journal) {
>> jbd2_journal_destroy(sbi->s_journal);
>> sbi->s_journal = NULL;
>> }
>> -failed_mount3a:
>> - ext4_es_unregister_shrinker(sbi);
>> -failed_mount3:
>> - flush_work(&sbi->s_error_work);
>
> sbi->s_journal is set to NULL, which means that in
> flush_stashed_error_work(), journal will be NULL, which means we won't
> call start_this_handle and so this change will not make a difference
> given the kernel stack trace in the commit description.
For example, before wo goto failed_mount_wq, we may meet some error and
will goto ext4_handle_error which can call
schedule_work(&EXT4_SB(sb)->s_error_work). So the work may start
concurrent with ext4_fill_super goto failed_mount_wq. There does not
have any lock to protect the concurrent read and modifies for
sbi->s_journal.
Injection fault some delay between jbd2_journal_destory and
sbi->s_journal and
We can injection fault while we do mount and add some delay like follow.
We will get some panic report easily...
failed_mount_wq:
ext4_xattr_destroy_cache(sbi->s_ea_inode_cache);
sbi->s_ea_inode_cache = NULL;
ext4_xattr_destroy_cache(sbi->s_ea_block_cache);
sbi->s_ea_block_cache = NULL;
if (sbi->s_journal) {
jbd2_journal_destroy(sbi->s_journal);
msleep(300); <---- add some delay
sbi->s_journal = NULL;
}
So we need to make sure to work will exists while we destroy the
journal. It maybe better to fix it by move the flush_work before destory
journal.
>
> Thanks,
>
> - Ted
> .
>
Powered by blists - more mailing lists