linux-ext4 - Re: [PATCH] e2fsck: Avoid changes on recovery flags when jbd2_journal

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <ecd3d264-c602-e781-1d0b-86d4c4541352@huawei.com>
Date:   Fri, 5 Mar 2021 17:48:28 +0800
From:   Haotian Li <lihaotian9@...wei.com>
To:     harshad shirwadkar <harshadshirwadkar@...il.com>
CC:     "Theodore Y. Ts'o" <tytso@....edu>,
        Ext4 Developers List <linux-ext4@...r.kernel.org>,
        linfeilong <linfeilong@...wei.com>, <liangyun2@...wei.com>,
        Zhiqiang Liu <liuzhiqiang26@...wei.com>
Subject: Re: [PATCH] e2fsck: Avoid changes on recovery flags when
 jbd2_journal_recover() failed

I'm very sorry for the delay. Thanks for your suggestion. Just as you said,
we use an e2fsck.conf option "recovery_error_behavior" to help user
adopt different behavior on this situation. The new v2 patch will be resent.

在 2021/1/6 7:06, harshad shirwadkar 写道:
> Sorry for the delay. Thanks for providing more information, Haotian.
> So this is happening due to IO errors experienced due to a flaky
> network connection. I can imagine that this is perhaps a situation
> which is recoverable but I guess when running on physical hardware,
> it's less likely for such IO errors to be recoverable. I wonder if
> this means we need an e2fsck.conf option - something like
> "recovery_error_behavior" with default value of "continue". For
> usecases such as this, we can set it to "exit" or perhaps "retry"?
> 
> On Thu, Dec 24, 2020 at 5:49 PM Zhiqiang Liu <liuzhiqiang26@...wei.com> wrote:
>>
>> friendly ping...
>>
>> On 2020/12/15 15:43, Haotian Li wrote:
>>> Thanks for your review. I agree with you that it's more important
>>> to understand the errors found by e2fsck. we'll decribe the case
>>> below about this problem.
>>>
>>> The probelm we find actually in a remote storage case. It means
>>> e2fsck's read or write may fail because of the network packet loss.
>>> At first time, some packet loss errors happen during e2fsck's journal
>>> recovery (using fsck -a), then recover failed. At second time, we
>>> fix the network problem and run e2fsck again, but it still has errors
>>> when we try to mount. Then we set jsb->s_start journal flags and retry
>>> e2fsck, the problem is fixed. So we suspect something wrong on e2fsck's
>>> journal recovery, probably the bug we've described on the patch.
>>>
>>> Certainly, directly exit is not a good way to fix this problem.
>>> just like what Harshad said, we need tell user what happen and listen
>>> user's decision, continue e2fsck or not. If we want to safely use
>>> e2fsck without human intervention (using fsck -a), I wonder if we need
>>> provide a safe mechanism to complate the fast check but avoid changes
>>> on journal or something else which may be fixed in feature (such
>>> as jsb->s_start flag)?
>>>
>>> Thanks
>>> Haotian
>>>
>>> 在 2020/12/15 4:27, Theodore Y. Ts'o 写道:
>>>> On Mon, Dec 14, 2020 at 10:44:29AM -0800, harshad shirwadkar wrote:
>>>>> Hi Haotian,
>>>>>
>>>>> Yeah perhaps these are the only recoverable errors. I also think that
>>>>> we can't surely say that these errors are recoverable always. That's
>>>>> because in some setups, these errors may still be unrecoverable (for
>>>>> example, if the machine is running under low memory). I still feel
>>>>> that we should ask the user about whether they want to continue or
>>>>> not. The reason is that firstly if we don't allow running e2fsck in
>>>>> these cases, I wonder what would the user do with their file system -
>>>>> they can't mount / can't run fsck, right? Secondly, not doing that
>>>>> would be a regression. I wonder if some setups would have chosen to
>>>>> ignore journal recovery if there are errors during journal recovery
>>>>> and with this fix they may start seeing that their file systems aren't
>>>>> getting repaired.
>>>>
>>>> It may very well be that there are corrupted file system structures
>>>> that could lead to ENOMEM.  If so, I'd consider that someone we should
>>>> be explicitly checking for in e2fsck, and it's actually relatively
>>>> unlikely in the jbd2 recovery code, since that's fairly straight
>>>> forward --- except I'd be concerned about potential cases in your Fast
>>>> Commit code, since there's quite a bit more complexity when parsing
>>>> the fast commit journal.
>>>>
>>>> This isn't a new concern; we've already talked a about the fact the
>>>> fast commit needs to have a lot more sanity checks to look for
>>>> maliciously --- or syzbot generated, which may be the same thing :-)
>>>> --- inconsistent fields causing the e2fsck reply code to behave in
>>>> unexpected way, which might include trying to allocate insane amounts
>>>> of memory, array buffer overruns, etc.
>>>>
>>>> But assuming that ENOMEM is always due to operational concerns, as
>>>> opposed to file system corruption, may not always be a safe
>>>> assumption.
>>>>
>>>> Something else to consider is from the perspective of a naive system
>>>> administrator, if there is an bad media sector in the journal, simply
>>>> always aborting the e2fsck run may not allow them an easy way to
>>>> recover.  Simply ignoring the journal and allowing the next write to
>>>> occur, at which point the HDD or SSD will redirect the write to a bad
>>>> sector spare spool, will allow for an automatic recovery.  Simply
>>>> always causing e2fsck to fail, would actually result in a worse
>>>> outcome in this particular case.
>>>>
>>>> (This is especially true for a mobile device, where the owner is not
>>>> likely to have access to the serial console to manually run e2fsck,
>>>> and where if they can't automatically recover, they will have to take
>>>> their phone to the local cell phone carrier store for repairs ---
>>>> which is *not* something that a cellular provider will enjoy, and they
>>>> will tend to choose other cell phone models to feature as
>>>> supported/featured devices.  So an increased number of failures which
>>>> cann't be automatically recovered cause the carrier to choose to
>>>> feature, say, a Xiaomi phone over a ZTE phone.)
>>>>
>>>>> I'm wondering if you saw any a situation in your setup where exiting
>>>>> e2fsck helped? If possible, could you share what kind of errors were
>>>>> seen in journal recovery and what was the expected behavior? Maybe
>>>>> that would help us decide on the right behavior.
>>>>
>>>> Seconded; I think we should try to understand why it is that e2fsck is
>>>> failing with these sorts of errors.  It may be that there are better
>>>> ways of solving the high-level problem.
>>>>
>>>> For example, the new libext2fs bitmap backends were something that I
>>>> added because when running a large number of e2fsck processes in
>>>> parallel on a server machine with dozens of HDD spindles was causing
>>>> e2fsck processes to run slowly due to memory contention.  We fixed it
>>>> by making e2fsck more memory efficient, by improving the bitmap
>>>> implementations --- but if that hadn't been sufficient, I had also
>>>> considered adding support to make /sbin/fsck "smarter" by limiting the
>>>> number of fsck.XXX processes that would get started simultaneously,
>>>> since that could actually cause the file system check to run faster by
>>>> reducing memory thrashing.  (The trick would have been how to make
>>>> fsck smart enough to automatically tune the number of parallel fsck
>>>> processes to allow, since asking the system administrator to manually
>>>> tune the max number of processes would be annoying to the sysadmin,
>>>> and would mean that the feature would never get used outside of $WORK
>>>> in practice.)
>>>>
>>>> So is the actual underlying problem that e2fsck is running out of
>>>> memory?  If so, is it because there simply isn't enough physical
>>>> memory available?  Is it being run in a cgroup container which is too
>>>> small?  Or is it because too many file systems are being checked in
>>>> parallel at the same time?
>>>>
>>>> Or is it I/O errors that you are concerned with?  And how do you know
>>>> that they are not permanent errors; is thie caused by something like
>>>> fibre channel connections being flaky?
>>>>
>>>> Or is this a hypotethical worry, as opposed to something which is
>>>> causing operational problems right now?
>>>>
>>>> Cheers,
>>>>
>>>>                                      - Ted
>>>>
>>>> .
>>>>
>>>
>>> .
>>>
>>
> .
>