linux-ext4 - Re: [PATCH] ext4, jbd2: ensure panic when there is no need to record errno in the jbd2 sb

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <a63cb7ea-8b39-df86-583b-e5af03a157fe@huawei.com>
Date:   Sat, 30 Nov 2019 22:50:08 +0800
From:   "zhangyi (F)" <yi.zhang@...wei.com>
To:     Jan Kara <jack@...e.cz>, <tytso@....edu>
CC:     <linux-ext4@...r.kernel.org>, <jack@...e.com>,
        <adilger.kernel@...ger.ca>, <liangyun2@...wei.com>
Subject: Re: [PATCH] ext4, jbd2: ensure panic when there is no need to record
 errno in the jbd2 sb

On 2019/11/30 11:24, zhangyi (F) wrote:
> On 2019/11/29 22:46, Jan Kara wrote:
>> On Tue 26-11-19 22:45:37, zhangyi (F) wrote:
>>> JBD2_REC_ERR flag used to indicate the errno has been updated when jbd2
>>> aborted, and then __ext4_abort() and ext4_handle_error() can invoke
>>> panic if ERRORS_PANIC is specified. But there is one exception, if jbd2
>>> thread failed to submit commit record, it abort journal through
>>> invoking __jbd2_journal_abort_hard() without set this flag, so we can
>>> no longer panic. Fix this by set such flag even if there is no need to
>>> record errno in the jbd2 super block.
>>>
>>> Fixes: 4327ba52afd03 ("ext4, jbd2: ensure entering into panic after recording an error in superblock")
>>> Signed-off-by: zhangyi (F) <yi.zhang@...wei.com>
>>> Cc: <stable@...r.kernel.org>
>>
>> Thanks for the patch. This indeed looks like a bug. I was trying hard to
>> understand why are we actually using __jbd2_journal_abort_hard() in
>> fs/jbd2/commit.c in the first place. And after some digging, I think it is
>> an oversight and we should just use jbd2_journal_abort(). The calls have been
>> introduced by commit 818d276ceb83a "ext4: Add the journal checksum
>> feature". Before that commit, we were just using jbd2_journal_abort() when
>> writing commit block failed. And when we use jbd2_journal_abort() from
>> everywhere, that will also deal with the problem you've found.
>>
>> Also as a nice cleanup we could then just drop __jbd2_journal_abort_hard(),
>> __jbd2_journal_abort_soft() and have all the functionality in a single
>> function jbd2_journal_abort().
>>
> 
> Indeed, it seems that we also need to record the errno if we failed to
> submit commit block, I will remove __jbd2_journal_abort_hard() and combine
> them in my next iteration.
> 

Hi Ted and Jan,
I am confusing about the commit fb7c02445c49 "ext4: pass
-ESHUTDOWN code to jbd2 layer" when I trying to cleanup the
__journal_abort_soft() and __jbd2_journal_abort_hard().

Before this commit, we will not record the errno if we shutdown the
filesystem no matter it has been aborted or not, so the errno will not
change.
After this commit, we record 0 to "sb->s_errno" for the first
jbd2_journal_abort(-ESHUTDOWN), and we also do not update the errno
if it has been aborted and record a no-zero errno because of the
follow checking.

+       if (journal->j_flags & JBD2_ABORT) {
+               write_unlock(&journal->j_state_lock);
+               if (!old_errno && old_errno != -ESHUTDOWN &&
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+                   errno == -ESHUTDOWN)
+                       jbd2_journal_update_sb_errno(journal);
+               return;
+       }

So the only modification of this patch is:
1) fix the lock;
2) set journal->j_errno = -ESHUTDOWN and JBD2_REC_ERR flag when we
   invoke jbd2_journal_abort(-ESHUTDOWN). These two modifications
   do not relate to the git log you mentioned.

I guess do you want to mean
  if (old_errno && old_errno != -ESHUTDOWN && errno == -ESHUTDOWN) ?

If so, why we need to overwrite the last aborted errno to 0,
if the filesystem was already aborted for some reasons, will
it cover up the issue? Am I miss something?

Thanks,
Yi.