linux-ext4 - 答复: [PATCH] jbd2: fix descriptor block checksum failed after format with lazy_journal

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <322cfa8f62184d81a9521e87c1336428@hikvision.com>
Date:   Fri, 14 Aug 2020 08:31:36 +0000
From:   常凤楠 <changfengnan@...vision.com>
To:     Jan Kara <jack@...e.cz>
CC:     "tytso@....edu" <tytso@....edu>, "jack@...e.com" <jack@...e.com>,
        "linux-ext4@...r.kernel.org" <linux-ext4@...r.kernel.org>
Subject: 答复: [PATCH] jbd2: fix descriptor block checksum failed after format with lazy_journal_init=1

> Change mke2fs so that even with lazy_journal_init=1, it would zero-out the first journal block after the superblock.

This method cannot solve the problem, because there is not always only one valid transactio after superblock, there may be several transactions,and each transaction may have problems.
There is a problem log, the invalid transaction's sequence 3.

[    2.802334] fs/ext4/super.c: (ext4_get_journal_inode, 4658): Journal inode found at (ptrval): 33554432 bytes
[    2.812199] fs/jbd2/journal.c: (jbd2_journal_init_inode, 1272): JBD2: inode mmcblk2p23/8, size 33554432, bits 12, blksize 4096
[    2.824139] fs/jbd2/recovery.c: (do_one_pass, 451): Starting recovery pass 0
[    2.831213] fs/jbd2/recovery.c: (do_one_pass, 478): Scanning for sequence ID 2 at 1/8192
[    2.839302] fs/jbd2/recovery.c: (do_one_pass, 484): JBD2: checking block 1
[    2.858923] fs/jbd2/recovery.c: (do_one_pass, 508): Found magic 1, sequence 2
[    2.886457] fs/jbd2/recovery.c: (do_one_pass, 478): Scanning for sequence ID 2 at 3/8192
[    2.894554] fs/jbd2/recovery.c: (do_one_pass, 484): JBD2: checking block 3
[    2.905453] fs/jbd2/recovery.c: (do_one_pass, 508): Found magic 2, sequence 2
[    2.912614] fs/jbd2/recovery.c: (do_one_pass, 478): Scanning for sequence ID 3 at 4/8192
[    2.920704] fs/jbd2/recovery.c: (do_one_pass, 484): JBD2: checking block 4
[    2.931593] fs/jbd2/recovery.c: (do_one_pass, 508): Found magic 1, sequence 3
[    2.959279] JBD2: Invalid checksum recovering block 5 in log
[    2.964939] fs/jbd2/recovery.c: (jbd2_journal_recover, 280): JBD2: recovery, exit status -74, recovered transactions 2 to 0
[    2.976058] fs/jbd2/recovery.c: (jbd2_journal_recover, 282): JBD2: Replayed 0 and revoked 0/0 blocks
[    2.985200] JBD2: recovery failed
[    2.988527] EXT4-fs (mmcblk2p23): error loading journal

Then you say the solution of comparing commit times is fragile, but I do not think so, maybe a little rough, but it always work in test. This is my consideration:
1. In a typical environment, the problematic transaction is written before formatting, so the commit time of the problematic transaction must be less than normal.
2. We check commit time only when checksum failed, if checksum failed is caused by disk error or something else, not caused by the previous transaction which belong to last journal, next transaction's commit time maybe equal to last transaction's commit time, but will not little than last.
3. Consider of time calibration, this may let next commit time little than last commit time, but we only check when checksum failed, in normal recovery process we won't check commit time, the possibility of a problem is very small.

And I'm thinking, Can We just add a member of superblock, indicates the number of transactions ? This way may need change a lot code.

What do people think, is there other solution?

-----邮件原件-----
发件人: Jan Kara <jack@...e.cz>
发送时间: 2020年8月6日 22:20
收件人: 常凤楠 <changfengnan@...vision.com>
抄送: tytso@....edu; jack@...e.com; linux-ext4@...r.kernel.org
主题: Re: [PATCH] jbd2: fix descriptor block checksum failed after format with lazy_journal_init=1

On Mon 03-08-20 02:25:45, 常凤楠 wrote:
> We encountered a file system crash, it's quite easy to reproduces when format with lazy_journal_init=1, the dmesg follwed:
> [    2.366962] JBD2: Invalid checksum recovering block 5 in log
> [    2.372680] JBD2: recovery failed
> [    2.376015] EXT4-fs (mmcblk2p1): error loading journal
>
> It's because of descriptor block checksum failed, We tested on linux-4.19.200 and linux-5.7.9 with e2fsprogs-1.45.6, and both failed.
> We found it only happened in format with lazy_journal_init=1, and the first mount time is a short time.
> After the analysis of jbd2, We think it's due to jbd2's super block just show first transaction id, but didn't show the nums of transaction.
>
> if you format with lazy_journal_init=1 first time, after mount a short time, you reboot machine, the layout of jbd2 may be like this:
>
> journal Superblock |descriptor_block | data_blocks | commmit_block | descriptor_block | data_blocks | commmit_block |  [more transactions...
> ---------------------|--------------------- transaction 1   --------------|---------------   transaction 2   -----------------|
>
> and after reboot, you format with lazy_journal_init=1 second time, after mount a short time, you reboot machine again, the layout of jbd2 may be like this:
>
> journal Superblock |descriptor_block | data_blocks|  commmit_block | descriptor_block |  data_blocks| commmit_block| [more transactions...
> ---------------------|----------------   transaction 1   -----------------|
>
> and then you mount filesystem again, when jbd2 start recovery, it will
> think there is more transactions after transaction 1, and when check transaction 2, since the journal csum seed has changed, it will definitely fail.

Thanks for the report and the detailed analysis of the problem! I was thinking how we could cleanly fix this since I don't quite like your solution of comparing commit times - that seems rather fragile. In the end I'd probably fix the problem in e2fsprogs:

Change mke2fs so that even with lazy_journal_init=1, it would zero-out the first journal block after the superblock. That will be enough to stop confusing journal recovery code with stale journal data but it will keep reporting errors if the journal checksum mismatches.

What do people think?

Honza

> You can reproduce with this simple script:
>
>     #!/bin/sh
>     mount /dev/nvme0n1 /tmp
>     if [ "$?" != "0" ] ; then
>        echo "mount model failed"
>         exit 1
>     fi
>     file=/tmp/flag_file
>     if [ ! -f $file ]; then
>         echo "do not format............."
>         echo $file
>         #you may need to modify this time range, must be less than the time used by the ext4lazyinit thread
>         t=$((( RANDOM % 8 ) + 5 ))
>         echo $t
>         touch $file
>         dd if=/dev/zero of=/tmp/test bs=1M count=$t
>         sleep $t
>         reboot -n -f
>         exit 0
>     else
>         echo "do format............."
>         umount /tmp
>         mkfs.ext4 -F /dev/nvme0n1 -E lazy_journal_init=1
>         sync
>         reboot -n -f
>         exit 0
>     fi
>
> We have an idea,  When jbd2 encountered a descriptor block checksum
> error during recovery, We check the commit time and compare it with the commit time of the previous transaction, if it's smaller than previous, We think it's not belong to same journal, then end this recovery, and return with success, otherwise return with failed just like before.
>
> This is the patch, diff from linux-5.7.9:
>
>
> --- fs/jbd2/recovery.c.orig      2020-07-16 14:13:36.000000000 +0800
> +++ fs/jbd2/recovery.c   2020-07-21 10:00:46.828449500 +0800
> @@ -412,7 +412,27 @@ static int jbd2_block_tag_csum_verify(jo
>      else
>             return tag->t_checksum == cpu_to_be16(csum32); }
> +/*
> + * We check the commit time and compare it with the commit time of
> +the previous transaction,
> + * if it's smaller than previous, We think it's not belong to same journal.
> + */
> +static int is_same_journal(journal_t *journal,struct buffer_head *bh,
> +unsigned long blocknr, __u64 last_commit_sec) {
> +     int commit_block_nr = blocknr + count_tags(journal, bh) + 1;
> +     struct buffer_head *  nbh;
> +
> +     int err = jread(&nbh, journal, commit_block_nr);
> +     if (err)
> +            return 1;
> +     struct commit_header *cbh = (struct commit_header *)nbh->b_data;
> +     __u64 commit_sec = be64_to_cpu(cbh->h_commit_sec);
> +
> +     if(commit_sec < last_commit_sec)
> +            return 0;
> +     else
> +            return 1;
> +}
> static int do_one_pass(journal_t *journal,
>                    struct recovery_info *info, enum passtype pass) {
> @@ -426,6 +446,7 @@ static int do_one_pass(journal_t *journa
>      int                 blocktype;
>      int                 tag_bytes = journal_tag_bytes(journal);
>      __u32                   crc32_sum = ~0; /* Transactional Checksums */
> +     __u64            last_commit_sec = 0;
>      int                 descr_csum_size = 0;
>      int                 block_error = 0;
> @@ -520,12 +541,22 @@ static int do_one_pass(journal_t *journa
>                    if (descr_csum_size > 0 &&
>                        !jbd2_descriptor_block_csum_verify(journal,
>                                                       bh->b_data)) {
> -                           printk(KERN_ERR "JBD2: Invalid checksum "
> +                          if(is_same_journal(journal,bh,next_log_block-1,last_commit_sec)) {
> +                                 printk(KERN_ERR "JBD2: Invalid checksum "
>                                  "recovering block %lu in log\n",
>                                  next_log_block);
> -                           err = -EFSBADCRC;
> -                           brelse(bh);
> -                           goto failed;
> +                                 err = -EFSBADCRC;
> +                                 brelse(bh);
> +                                 goto failed;
> +                          } else {
> +                                 /*if it's not belong to same journal, just end this recovery, return with success*/
> +                                 printk(KERN_ERR "JBD2: Invalid checksum "
> +                                 "found in block %lu in log, but not same journal %d\n",
> +                                 next_log_block,next_commit_ID);
> +                                 err = 0;
> +                                 brelse(bh);
> +                                 goto done;
> +                          }
>                    }
>                     /* If it is a valid descriptor block, replay it @@
> -688,11 +719,15 @@ static int do_one_pass(journal_t *journa
>                     * are present verify them in PASS_SCAN; else not
>                     * much to do other than move on to the next sequence
>                     * number. */
> +                   if(pass == PASS_SCAN) {
> +                          struct commit_header *cbh =
> +                                 (struct commit_header *)bh->b_data;
> +                          last_commit_sec = be64_to_cpu(cbh->h_commit_sec);
> +                   }
>                    if (pass == PASS_SCAN &&
>                        jbd2_has_feature_checksum(journal)) {
>                           int chksum_err, chksum_seen;
> -                           struct commit_header *cbh =
> -                                  (struct commit_header *)bh->b_data;
> +
>                           unsigned found_chksum =
>                                  be32_to_cpu(cbh->h_chksum[0]);
>
>
> ________________________________
>
> CONFIDENTIALITY NOTICE: This electronic message is intended to be
> viewed only by the individual or entity to whom it is addressed. It
> may contain information that is privileged, confidential and exempt
> from disclosure under applicable law. Any dissemination, distribution
> or copying of this communication is strictly prohibited without our
> prior permission. If the reader of this message is not the intended
> recipient, or the employee or agent responsible for delivering the
> message to the intended recipient, or if you have received this
> communication in error, please notify us immediately by return e-mail
> and delete the original message and any copies of it from your
> computer system. For further information about Hikvision company.
> please see our website at www.hikvision.com<http://www.hikvision.com>
--
Jan Kara <jack@...e.com>
SUSE Labs, CR

________________________________

CONFIDENTIALITY NOTICE: This electronic message is intended to be viewed only by the individual or entity to whom it is addressed. It may contain information that is privileged, confidential and exempt from disclosure under applicable law. Any dissemination, distribution or copying of this communication is strictly prohibited without our prior permission. If the reader of this message is not the intended recipient, or the employee or agent responsible for delivering the message to the intended recipient, or if you have received this communication in error, please notify us immediately by return e-mail and delete the original message and any copies of it from your computer system. For further information about Hikvision company. please see our website at www.hikvision.com<http://www.hikvision.com>