linux-ext4 - Re: [PATCH] ext4: flush s_error_work before journal destroy in ext4_fill

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <c0c8619d-3d9b-a184-3cd1-0cd88447fdcd@huawei.com>
Date:   Fri, 23 Jul 2021 21:11:08 +0800
From:   yangerkun <yangerkun@...wei.com>
To:     Theodore Ts'o <tytso@....edu>
CC:     <jack@...e.cz>, <linux-ext4@...r.kernel.org>, <yukuai3@...wei.com>
Subject: Re: [PATCH] ext4: flush s_error_work before journal destroy in
 ext4_fill_super



在 2021/7/23 20:11, Theodore Ts'o 写道:
> On Tue, Jul 20, 2021 at 02:24:09PM +0800, yangerkun wrote:
>> 'commit c92dc856848f ("ext4: defer saving error info from atomic
>> context")' and '2d01ddc86606 ("ext4: save error info to sb through journal
>> if available")' add s_error_work to fix checksum error problem. But the
>> error path in ext4_fill_super can lead the follow BUG_ON.
> 
> Can you share with me your test case?  Your patch will result in the
> shrinker potentially not getting released in some error paths (which
> will cause other kernel panics), and in any case, once the journal is
> destroyed here:
> 
>> @@ -5173,15 +5173,15 @@ static int ext4_fill_super(struct super_block *sb, void *data, int silent)
>>   
>>   	ext4_xattr_destroy_cache(sbi->s_ea_block_cache);
>>   	sbi->s_ea_block_cache = NULL;
>> +failed_mount3a:
>> +	ext4_es_unregister_shrinker(sbi);
>> +failed_mount3:
>> +	flush_work(&sbi->s_error_work);
>>   
>>   	if (sbi->s_journal) {
>>   		jbd2_journal_destroy(sbi->s_journal);
>>   		sbi->s_journal = NULL;
>>   	}
>> -failed_mount3a:
>> -	ext4_es_unregister_shrinker(sbi);
>> -failed_mount3:
>> -	flush_work(&sbi->s_error_work);
> 
> sbi->s_journal is set to NULL, which means that in
> flush_stashed_error_work(), journal will be NULL, which means we won't
> call start_this_handle and so this change will not make a difference
> given the kernel stack trace in the commit description.


For example, before wo goto failed_mount_wq, we may meet some error and 
will goto ext4_handle_error which can call 
schedule_work(&EXT4_SB(sb)->s_error_work). So the work may start 
concurrent with ext4_fill_super goto failed_mount_wq. There does not 
have any lock to protect the concurrent read and modifies for 
sbi->s_journal.


Injection fault  some delay between jbd2_journal_destory and 
sbi->s_journal and

We can injection fault while we do mount and add some delay like follow. 
We will get some panic report easily...


  failed_mount_wq:
          ext4_xattr_destroy_cache(sbi->s_ea_inode_cache);
          sbi->s_ea_inode_cache = NULL;

          ext4_xattr_destroy_cache(sbi->s_ea_block_cache);
          sbi->s_ea_block_cache = NULL;

          if (sbi->s_journal) {
                  jbd2_journal_destroy(sbi->s_journal);
                  msleep(300); <---- add some delay
                  sbi->s_journal = NULL;
          }


So we need to make sure to work will exists while we destroy the 
journal. It maybe better to fix it by move the flush_work before destory 
journal.


> 
> Thanks,
> 
> 						- Ted
> .
>