linux-ext4 - Re: ext4_clear_journal_err: Filesystem error recorded from previous mount: IO failure

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <4CC57E0A.9070502@redhat.com>
Date:	Mon, 25 Oct 2010 08:54:34 -0400
From:	Ric Wheeler <rwheeler@...hat.com>
To:	Ric Wheeler <rwheeler@...hat.com>
CC:	Andreas Dilger <andreas.dilger@...cle.com>,
	Bernd Schubert <bschubert@....com>, "Ted Ts'o" <tytso@....edu>,
	Amir Goldstein <amir73il@...il.com>,
	Bernd Schubert <bs_lists@...ef.fastmail.fm>,
	Ext4 Developers List <linux-ext4@...r.kernel.org>
Subject: Re: ext4_clear_journal_err: Filesystem error recorded from previous
 mount: IO failure

  On 10/25/2010 07:45 AM, Ric Wheeler wrote:
>  On 10/25/2010 06:14 AM, Andreas Dilger wrote:
>> On 2010-10-25, at 00:43, Ric Wheeler wrote:
>>> On 10/24/2010 12:16 PM, Bernd Schubert wrote:
>>>> ... sometimes the error state is only set *after* mounting the filesystem,
>>>> so difficult to script it.  And as I also wrote, running e2fsck from that
>>>> script and to do a complete fs check is not appropriate, as that might
>>>> simply time out.  Again not Lustre specific. So after some discussion,
>>>> the proposed solution is to add a "journal recovery only" option to e2fsck
>>>> and to do that before the mount. I will add that to the 'lustre_server'
>>>> agent (which is part of Lustre now), but leave it to someone else to that
>>>> for the 'Filesystem' agent script (I'm not using that script myself and
>>>> IMHO it is already too complex, as it tries to support all filesystems -
>>>>   shell code is ideal anymore then).
>>> Why not simply have your script attempt to mount the file system? If it 
>>> succeeds, it will replay the journal. If it fails, you will need to fall 
>>> back to the long fsck which is unavoidable.
>> I don't really agree with this.  The whole reason for having the error flag 
>> in the superblock and ALWAYS running e2fsck at mount time to replay the 
>> journal is that e2fsck should be done before mounting the filesystem.
>>
>> I really dislike the reiserfs/XFS model where a filesystem is mounted and 
>> fsck is not run in advance, and then if there is a serious error in the 
>> filesystem this needs to be detected by the kernel, the filesystem unmounted, 
>> e2fsck started, and the filesystem remounted...  That's just backward.
>>
>
> Even if you disagree with the model, that would seem to solve the issue for 
> Bernd without having to make a change in the utilities.
>
> Thanks!
>
> Ric
>
>>> We spend a lot of time and testing to make sure that ext* can be shot at any 
>>> point and come back after a storage outage and still mount.
>> Sure, it can still mount, but the only thing it might be able to do is detect 
>> the error and remount the filesystem read-only or panic...  That's why e2fsck 
>> should ALWAYS be run BEFORE the filesystem is mounted.
>>
>> Bernd's issue (the part that I agree with) is that the error may only be 
>> recorded in the journal, not in the ext3 superblock, and there is no easy way 
>> to detect this from userspace.  Allowing e2fsck to only replay the journal is 
>> useful this problem.  Another similar issue is that if tune2fs is run on an 
>> unmounted filesystem that hasn't had a journal replay, then it may modify the 
>> superblock, but journal replay will clobber this.  There are other similar 
>> issues.
>>
>> Cheers, Andreas
>> -- 
>> Andreas Dilger
>> Lustre Technical Lead
>> Oracle Corporation Canada Inc.
>>
>

One more thought here is that effectively the xfs model of mount before fsck is 
basically just doing the journal replay - if you need to repair the file system, 
it will fail to mount. If not, you are done.

For HA fail over, what Bernd is proposing is effectively equivalent:

(1) Replay the journal without doing a full fsck which is the same as the mount 
for XFS

(2) See if the journal replay failed (i.e., set the error flag) which is the 
same as seeing if the mount succeeded

(3) If error, you need to do a full, time consuming fsck for either

(4) If no error in (2), you need to mount the file system for ext4 (xfs is 
already done at this stage)

Aside from putting the journal replay into a magic fsck flag, I really do not 
see that you are saving any complexity.  In fact, for this case, you add step (4).

Regards,

Ric

--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html