lists.openwall.net | lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening PHC | |
Open Source and information security mailing list archives
| ||
|
Date: Tue, 3 Aug 2021 19:13:07 +0800 From: yangerkun <yangerkun@...wei.com> To: Jan Kara <jack@...e.cz> CC: Theodore Ts'o <tytso@....edu>, <linux-ext4@...r.kernel.org>, <yukuai3@...wei.com> Subject: Re: [PATCH] ext4: flush s_error_work before journal destroy in ext4_fill_super 在 2021/7/26 21:26, Jan Kara 写道: > On Mon 26-07-21 15:13:34, yangerkun wrote: >> >> >> 在 2021/7/24 3:11, Theodore Ts'o 写道: >>> On Fri, Jul 23, 2021 at 09:11:08PM +0800, yangerkun wrote: >>>> >>>> For example, before wo goto failed_mount_wq, we may meet some error and will >>>> goto ext4_handle_error which can call >>>> schedule_work(&EXT4_SB(sb)->s_error_work). So the work may start concurrent >>>> with ext4_fill_super goto failed_mount_wq. There does not have any lock to >>>> protect the concurrent read and modifies for sbi->s_journal. >>> >>> Yes, and I'm asking *how* is this actually happening in practice? >>> I've been going through the code paths and I don't see any place where >>> ext4_error*() would be called. That's why I wanted to see your test >>> case which was reproducing it. (Not just where you added the msleep, >>> but how the error was getting triggered in the first place.) >> >> Hi Ted, >> >> >> The problem only happened once early with parallel ltp testcase(but we >> cannot reproduce it again with same case...). And dmesg with latter: >> >> >> [32031.739678] EXT4-fs error (device loop66): ext4_fill_super:4672: comm >> chdir01: inode #2: comm chdir01: iget: illegal inode # >> [32031.740193] EXT4-fs (loop66): get root inode failed >> [32031.740484] EXT4-fs (loop66): mount failed > > Oh, OK. I guess s_inodes_count was <= 1 which made a check in __ext4_iget() > trip. That is theoretically possible if s_inodes_per_block == 1, > s_inodes_per_group == 1 and s_groups_count == 1. I guess ext4_fill_super() > needs to check s_inodes_count is large enough at least for all reserved > inodes to fit. But it's still unclear to me how we could have succeeded in > loading the journal which apparently has inode number 8 from the error > message below. > > In parallel LTP checks I've sometimes noticed strange errors in the past > that looked very much like filesystem being modified through the block > device while being in use by the kernel (maybe some bug in the test > framework). And this is something we just don't support and the kernel can > crash in this case. > > In either case I agree it makes sense to make sure error work cannot race > with journal destruction either by flushing it before destroying the > journal or some other means... Hi Ted, Should we apply this patch first, or do you have some better solution... Thanks, Kun. > > Honza > >> [32031.758811] EXT4-fs error (device loop66): ext4_map_blocks:595: inode #8: >> block 532: comm chdir01: lblock 1 mapped to illegal pblock 532 (length 1) >> [32031.759293] jbd2_journal_bmap: journal block not found at offset 1 on >> loop66-8 >> [32031.759805] ------------[ cut here ]------------ >> [32031.759807] kernel BUG at fs/jbd2/transaction.c:373! >> >> >> ext4_fill_super >> ext4_load_journal >> EXT4_SB(sb)->s_journal = journal >> root = ext4_iget(sb, EXT4_ROOT_INO, EXT4_IGET_SPECIAL) >> // will failed and goto failed_mount4 >> __ext4_iget >> __ext4_error >> ext4_handle_error >> schedule_work(&EXT4_SB(sb)->s_error_work) >> >> >> And this trigger the concurrent read and modifies for sbi->s_journal... >> >> Thanks, >> Kun. >> >> >>> >>> >>> On Fri, Jul 23, 2021 at 09:25:12PM +0800, yangerkun wrote: >>>> >>>>> Can you share with me your test case? Your patch will result in the >>>>> shrinker potentially not getting released in some error paths (which >>>>> will cause other kernel panics), and in any case, once the journal is >>>> >>>> The only logic we have changed is that we move the flush_work before we call >>>> jbd2_journal_destory. I have not seen the problem you describe... Can you >>>> help to explain more... >>> >>> Sorry, I was mistaken. I thought you were moving the >>> ext4_es_unregister_shrinker() and flush_work() before the label for >>> failed_mount_wq; that was a misreading of your patch. >>> >>> The other way we could fix this might be something like this: >>> >>> diff --git a/fs/ext4/super.c b/fs/ext4/super.c >>> index dfa09a277b56..d663d11fa0de 100644 >>> --- a/fs/ext4/super.c >>> +++ b/fs/ext4/super.c >>> @@ -693,7 +693,7 @@ static void flush_stashed_error_work(struct work_struct *work) >>> { >>> struct ext4_sb_info *sbi = container_of(work, struct ext4_sb_info, >>> s_error_work); >>> - journal_t *journal = sbi->s_journal; >>> + journal_t *journal = READ_ONCE(sbi->s_journal); >>> handle_t *handle; >>> /* >>> @@ -1184,9 +1184,11 @@ static void ext4_put_super(struct super_block *sb) >>> ext4_unregister_sysfs(sb); >>> if (sbi->s_journal) { >>> - aborted = is_journal_aborted(sbi->s_journal); >>> - err = jbd2_journal_destroy(sbi->s_journal); >>> - sbi->s_journal = NULL; >>> + journal_t *journal = sbi->s_journal; >>> + >>> + WRITE_ONCE(sbi->s_journal, NULL); >>> + aborted = is_journal_aborted(journal); >>> + err = jbd2_journal_destroy(journal); >>> if ((err < 0) && !aborted) { >>> ext4_abort(sb, -err, "Couldn't clean up the journal"); >>> } >>> @@ -5175,8 +5177,10 @@ static int ext4_fill_super(struct super_block *sb, void *data, int silent) >>> sbi->s_ea_block_cache = NULL; >>> if (sbi->s_journal) { >>> - jbd2_journal_destroy(sbi->s_journal); >>> - sbi->s_journal = NULL; >>> + journal_t *journal = sbi->s_journal; >>> + >>> + WRITE_ONCE(sbi->s_journal, NULL); >>> + jbd2_journal_destroy(journal); >>> } >>> failed_mount3a: >>> ext4_es_unregister_shrinker(sbi); >>> @@ -5487,7 +5491,7 @@ static int ext4_load_journal(struct super_block *sb, >>> EXT4_SB(sb)->s_journal = journal; >>> err = ext4_clear_journal_err(sb, es); >>> if (err) { >>> - EXT4_SB(sb)->s_journal = NULL; >>> + WRITE_ONCE(EXT4_SB(sb)->s_journal, NULL); >>> jbd2_journal_destroy(journal); >>> return err; >>> } >>> >>> ... and here's another possible fix: >>> >>> diff --git a/fs/ext4/super.c b/fs/ext4/super.c >>> index dfa09a277b56..e9e122e52ce8 100644 >>> --- a/fs/ext4/super.c >>> +++ b/fs/ext4/super.c >>> @@ -704,7 +704,8 @@ static void flush_stashed_error_work(struct work_struct *work) >>> * We use directly jbd2 functions here to avoid recursing back into >>> * ext4 error handling code during handling of previous errors. >>> */ >>> - if (!sb_rdonly(sbi->s_sb) && journal) { >>> + if (!sb_rdonly(sbi->s_sb) && journal && >>> + !(journal->j_flags & JBD2_UNMOUNT)) { >>> struct buffer_head *sbh = sbi->s_sbh; >>> handle = jbd2_journal_start(journal, 1); >>> if (IS_ERR(handle)) >>> >>> >>> >>> But I would be interested in understanding how we could be triggering >>> this problem in the first place before deciding what's the best fix. >>> >>> Cheers, >>> >>> - Ted >>> . >>>
Powered by blists - more mailing lists