[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <897e53a5-8fc5-1c0a-4734-60ab4fb0fc65@huawei.com>
Date: Tue, 3 Aug 2021 19:13:07 +0800
From: yangerkun <yangerkun@...wei.com>
To: Jan Kara <jack@...e.cz>
CC: Theodore Ts'o <tytso@....edu>, <linux-ext4@...r.kernel.org>,
<yukuai3@...wei.com>
Subject: Re: [PATCH] ext4: flush s_error_work before journal destroy in
ext4_fill_super
在 2021/7/26 21:26, Jan Kara 写道:
> On Mon 26-07-21 15:13:34, yangerkun wrote:
>>
>>
>> 在 2021/7/24 3:11, Theodore Ts'o 写道:
>>> On Fri, Jul 23, 2021 at 09:11:08PM +0800, yangerkun wrote:
>>>>
>>>> For example, before wo goto failed_mount_wq, we may meet some error and will
>>>> goto ext4_handle_error which can call
>>>> schedule_work(&EXT4_SB(sb)->s_error_work). So the work may start concurrent
>>>> with ext4_fill_super goto failed_mount_wq. There does not have any lock to
>>>> protect the concurrent read and modifies for sbi->s_journal.
>>>
>>> Yes, and I'm asking *how* is this actually happening in practice?
>>> I've been going through the code paths and I don't see any place where
>>> ext4_error*() would be called. That's why I wanted to see your test
>>> case which was reproducing it. (Not just where you added the msleep,
>>> but how the error was getting triggered in the first place.)
>>
>> Hi Ted,
>>
>>
>> The problem only happened once early with parallel ltp testcase(but we
>> cannot reproduce it again with same case...). And dmesg with latter:
>>
>>
>> [32031.739678] EXT4-fs error (device loop66): ext4_fill_super:4672: comm
>> chdir01: inode #2: comm chdir01: iget: illegal inode #
>> [32031.740193] EXT4-fs (loop66): get root inode failed
>> [32031.740484] EXT4-fs (loop66): mount failed
>
> Oh, OK. I guess s_inodes_count was <= 1 which made a check in __ext4_iget()
> trip. That is theoretically possible if s_inodes_per_block == 1,
> s_inodes_per_group == 1 and s_groups_count == 1. I guess ext4_fill_super()
> needs to check s_inodes_count is large enough at least for all reserved
> inodes to fit. But it's still unclear to me how we could have succeeded in
> loading the journal which apparently has inode number 8 from the error
> message below.
>
> In parallel LTP checks I've sometimes noticed strange errors in the past
> that looked very much like filesystem being modified through the block
> device while being in use by the kernel (maybe some bug in the test
> framework). And this is something we just don't support and the kernel can
> crash in this case.
>
> In either case I agree it makes sense to make sure error work cannot race
> with journal destruction either by flushing it before destroying the
> journal or some other means...
Hi Ted,
Should we apply this patch first, or do you have some better solution...
Thanks,
Kun.
>
> Honza
>
>> [32031.758811] EXT4-fs error (device loop66): ext4_map_blocks:595: inode #8:
>> block 532: comm chdir01: lblock 1 mapped to illegal pblock 532 (length 1)
>> [32031.759293] jbd2_journal_bmap: journal block not found at offset 1 on
>> loop66-8
>> [32031.759805] ------------[ cut here ]------------
>> [32031.759807] kernel BUG at fs/jbd2/transaction.c:373!
>>
>>
>> ext4_fill_super
>> ext4_load_journal
>> EXT4_SB(sb)->s_journal = journal
>> root = ext4_iget(sb, EXT4_ROOT_INO, EXT4_IGET_SPECIAL)
>> // will failed and goto failed_mount4
>> __ext4_iget
>> __ext4_error
>> ext4_handle_error
>> schedule_work(&EXT4_SB(sb)->s_error_work)
>>
>>
>> And this trigger the concurrent read and modifies for sbi->s_journal...
>>
>> Thanks,
>> Kun.
>>
>>
>>>
>>>
>>> On Fri, Jul 23, 2021 at 09:25:12PM +0800, yangerkun wrote:
>>>>
>>>>> Can you share with me your test case? Your patch will result in the
>>>>> shrinker potentially not getting released in some error paths (which
>>>>> will cause other kernel panics), and in any case, once the journal is
>>>>
>>>> The only logic we have changed is that we move the flush_work before we call
>>>> jbd2_journal_destory. I have not seen the problem you describe... Can you
>>>> help to explain more...
>>>
>>> Sorry, I was mistaken. I thought you were moving the
>>> ext4_es_unregister_shrinker() and flush_work() before the label for
>>> failed_mount_wq; that was a misreading of your patch.
>>>
>>> The other way we could fix this might be something like this:
>>>
>>> diff --git a/fs/ext4/super.c b/fs/ext4/super.c
>>> index dfa09a277b56..d663d11fa0de 100644
>>> --- a/fs/ext4/super.c
>>> +++ b/fs/ext4/super.c
>>> @@ -693,7 +693,7 @@ static void flush_stashed_error_work(struct work_struct *work)
>>> {
>>> struct ext4_sb_info *sbi = container_of(work, struct ext4_sb_info,
>>> s_error_work);
>>> - journal_t *journal = sbi->s_journal;
>>> + journal_t *journal = READ_ONCE(sbi->s_journal);
>>> handle_t *handle;
>>> /*
>>> @@ -1184,9 +1184,11 @@ static void ext4_put_super(struct super_block *sb)
>>> ext4_unregister_sysfs(sb);
>>> if (sbi->s_journal) {
>>> - aborted = is_journal_aborted(sbi->s_journal);
>>> - err = jbd2_journal_destroy(sbi->s_journal);
>>> - sbi->s_journal = NULL;
>>> + journal_t *journal = sbi->s_journal;
>>> +
>>> + WRITE_ONCE(sbi->s_journal, NULL);
>>> + aborted = is_journal_aborted(journal);
>>> + err = jbd2_journal_destroy(journal);
>>> if ((err < 0) && !aborted) {
>>> ext4_abort(sb, -err, "Couldn't clean up the journal");
>>> }
>>> @@ -5175,8 +5177,10 @@ static int ext4_fill_super(struct super_block *sb, void *data, int silent)
>>> sbi->s_ea_block_cache = NULL;
>>> if (sbi->s_journal) {
>>> - jbd2_journal_destroy(sbi->s_journal);
>>> - sbi->s_journal = NULL;
>>> + journal_t *journal = sbi->s_journal;
>>> +
>>> + WRITE_ONCE(sbi->s_journal, NULL);
>>> + jbd2_journal_destroy(journal);
>>> }
>>> failed_mount3a:
>>> ext4_es_unregister_shrinker(sbi);
>>> @@ -5487,7 +5491,7 @@ static int ext4_load_journal(struct super_block *sb,
>>> EXT4_SB(sb)->s_journal = journal;
>>> err = ext4_clear_journal_err(sb, es);
>>> if (err) {
>>> - EXT4_SB(sb)->s_journal = NULL;
>>> + WRITE_ONCE(EXT4_SB(sb)->s_journal, NULL);
>>> jbd2_journal_destroy(journal);
>>> return err;
>>> }
>>>
>>> ... and here's another possible fix:
>>>
>>> diff --git a/fs/ext4/super.c b/fs/ext4/super.c
>>> index dfa09a277b56..e9e122e52ce8 100644
>>> --- a/fs/ext4/super.c
>>> +++ b/fs/ext4/super.c
>>> @@ -704,7 +704,8 @@ static void flush_stashed_error_work(struct work_struct *work)
>>> * We use directly jbd2 functions here to avoid recursing back into
>>> * ext4 error handling code during handling of previous errors.
>>> */
>>> - if (!sb_rdonly(sbi->s_sb) && journal) {
>>> + if (!sb_rdonly(sbi->s_sb) && journal &&
>>> + !(journal->j_flags & JBD2_UNMOUNT)) {
>>> struct buffer_head *sbh = sbi->s_sbh;
>>> handle = jbd2_journal_start(journal, 1);
>>> if (IS_ERR(handle))
>>>
>>>
>>>
>>> But I would be interested in understanding how we could be triggering
>>> this problem in the first place before deciding what's the best fix.
>>>
>>> Cheers,
>>>
>>> - Ted
>>> .
>>>
Powered by blists - more mailing lists