linux-ext4 - Re: ext4_clear_journal_err: Filesystem error recorded from previous mount: IO failure

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20101023222605.GC24650@thunk.org>
Date:	Sat, 23 Oct 2010 18:26:05 -0400
From:	Ted Ts'o <tytso@....edu>
To:	Bernd Schubert <bs_lists@...ef.fastmail.fm>
Cc:	Amir Goldstein <amir73il@...il.com>, linux-ext4@...r.kernel.org,
	Bernd Schubert <bschubert@....com>
Subject: Re: ext4_clear_journal_err: Filesystem error recorded from
 previous mount: IO failure

On Sat, Oct 23, 2010 at 07:46:56PM +0200, Bernd Schubert wrote:
> I'm really looking for something to abort the mount if an error comes up. 
> However, I just have an idea to do that without an additional mount flag:
> 
> Let e2fsck play back the journal only. That way e2fsck could set the
> error flag, if it detects a problem in the journal and our pacemaker
> script would refuse to mount. That option also would be quite useful
> for our other scripts, as we usually first run a read-only fsck,
> check the log files (presently by size, as e2fsck always returns an
> error code even for journal recoveries...)  and only if we don't see
> serious corruption we run e2fsck. Otherwise we sometimes create
> device or e2image backups.  Would a patch introducing "-J recover
> journal only" accepted?

So I'm confused, and partially it's because I don't know the
capabilities of pacemaker.

If you have a pacemaker script, why aren't you willing to just run
e2fsck on the journal and be done with it?  Earlier you talked about
"man months of effort" to rewrite pacemaker.  Huh?  If the file system
is fine, it will recover the journal, and then see that the file
system is clean, and then exit.

As far as the exit codes, it sounds like you haven't read the man
page.  The exit codes are documented in both the fsck and e2fsck man
page, and are standardized across all file systems:

            0    - No errors
            1    - File system errors corrected
            2    - System should be rebooted
            4    - File system errors left uncorrected
            8    - Operational error
            16   - Usage or syntax error
            32   - Fsck canceled by user request
            128  - Shared library error

(These status codes are boolean OR'ed together.)

An exit code has the '1' bit set, that means that the file system had
some errors, but they have since been fixed.  And exit code where the
'2' bit is will only occur in the case of a mounted read-only file
system, and instructs the init script to reboot before continuing,
because while the file system may have had errors fixed, there may be
invalid information cached in memory due to the root file system being
mounted, so the only safe way to make sure that invalid information
won't be written back to disk is to reboot.  If you are not checking
the root filesystem, you will never see the '2' bit being set.

So if you are looking at the size of the fsck log files, I'm guessing
it's because no one has bothered to read and understand how the exit
codes for fsck works.

And I really don't understand why you need or want to do a read-only
fsck first....

						- Ted
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html