linux-ext4 - Re: [PATCH] e2fsck: Avoid changes on recovery flags when jbd2_journal

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAD+ocbxAyyFqoD6AYQVjQyqFzZde3+QOnUhC-VikAq4A3_t8JA@mail.gmail.com>
Date:   Fri, 11 Dec 2020 14:07:10 -0800
From:   harshad shirwadkar <harshadshirwadkar@...il.com>
To:     Haotian Li <lihaotian9@...wei.com>
Cc:     Ext4 Developers List <linux-ext4@...r.kernel.org>,
        "Theodore Y. Ts'o" <tytso@....edu>,
        "liuzhiqiang (I)" <liuzhiqiang26@...wei.com>,
        linfeilong <linfeilong@...wei.com>, tytso@...m.mit.edu
Subject: Re: [PATCH] e2fsck: Avoid changes on recovery flags when
 jbd2_journal_recover() failed

Hi Haotian,

Thanks for your patch. I noticed that the following test fails:

$ make -j 64
...
365 tests succeeded     1 tests failed
Tests failed: j_corrupt_revoke_rcount
make: *** [Makefile:397: test_post] Error 1

This test fails because the test expects e2fsck to continue even if
the journal superblock is corrupt and with your patch e2fsck exits
immediately. This brings up a higher level question - if we abort on
errors when recovery fails during fsck, how would that problem get
fixed if we don't run fsck? In this particular example, the journal
superblock is corrupt and that is an unrecoverable error. I wonder if
instead we should check for certain specific transient errors such as
-ENOMEM and only then exit? I suspect even in those cases we probably
should ask the user if they would like to continue or not. What do you
think?

Thanks,
Harshad


On Fri, Dec 11, 2020 at 4:19 AM Haotian Li <lihaotian9@...wei.com> wrote:
>
> jbd2_journal_revocer() may fail when some error occers
> such as ENOMEM. However, jsb->s_start is still cleared
> by func e2fsck_journal_release(). This may break
> consistency between metadata and data in disk. Sometimes,
> failure in jbd2_journal_revocer() is temporary but retry
> e2fsck will skip the journal recovery when the temporary
> problem is fixed.
>
> To fix this case, we use "fatal_error" instead "goto errout"
> when recover journal failed. We think if journal recovery
> fails, we need send error message to user and reserve the
> recovery flags to recover the journal when try e2fsck again.
>
> Reported-by: Liangyun <liangyun2@...wei.com>
> Signed-off-by: Haotian Li <lihaotian9@...wei.com>
> Signed-off-by: Zhiqiang Liu <liuzhiqiang26@...wei.com>
> ---
>  e2fsck/journal.c | 9 +++++++--
>  1 file changed, 7 insertions(+), 2 deletions(-)
>
> diff --git a/e2fsck/journal.c b/e2fsck/journal.c
> index 7d9f1b40..546beafd 100644
> --- a/e2fsck/journal.c
> +++ b/e2fsck/journal.c
> @@ -952,8 +952,13 @@ static errcode_t recover_ext3_journal(e2fsck_t ctx)
>                 goto errout;
>
>         retval = -jbd2_journal_recover(journal);
> -       if (retval)
> -               goto errout;
> +       if (retval && retval != EFSBADCRC && retval != EFSCORRUPTED) {
> +               ctx->fs->flags &= ~EXT2_FLAG_VALID;
> +               com_err(ctx->program_name, 0,
> +                                       _("Journal recovery failed "
> +                                         "on %s\n"), ctx->device_name);
> +               fatal_error(ctx, 0);
> +       }
>
>         if (journal->j_failed_commit) {
>                 pctx.ino = journal->j_failed_commit;
> --
> 2.19.1
>