linux-ext4 - Re: ext4_clear_journal_err: Filesystem error recorded from previous mount: IO failure

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <4CC43AC9.8000409@redhat.com>
Date:	Sun, 24 Oct 2010 09:55:21 -0400
From:	Ric Wheeler <rwheeler@...hat.com>
To:	"Ted Ts'o" <tytso@....edu>
CC:	Amir Goldstein <amir73il@...il.com>,
	Bernd Schubert <bs_lists@...ef.fastmail.fm>,
	linux-ext4@...r.kernel.org, Bernd Schubert <bschubert@....com>
Subject: Re: ext4_clear_journal_err: Filesystem error recorded from previous
 mount: IO failure

  On 10/23/2010 06:17 PM, Ted Ts'o wrote:
> On Sat, Oct 23, 2010 at 06:00:05PM +0200, Amir Goldstein wrote:
>> IMHO, and I've said it before, the mount flag which Bernd requests
>> already exists, namely 'errors=', both as mount option and as
>> persistent default, but it is not enforced correctly on mount time.
>> If an administrator decides that the correct behavior when error is
>> detected is abort or remount-ro, what's the sense it letting the
>> filesystem mount read-write without fixing the problem?
> Again, consider the case of the root filesystem containing an error.
> When the error is first discovered during the source of the system's
> operation, and it's set to errors=panic, you want to immediately
> reboot the system.  But then, when root file system is mounted, it
> would be bad to have the system immediately panic again.  Instead,
> what you want to have happen is to allow e2fsck to run, correct the
> file system errors, and then system can go back to normal operation.
>
> So the current behavior was deliberately designed to be the way that
> it is, and the difference is between "what do you do when you come
> across a file system error", which is what the errors= mount option is
> all about, and "this file system has some kind of error associated
> with it".  Just because it has an error associated with it does not
> mean that immediately rebooting is the right thing to do, even if the
> file system is set to "errors=panic".  In fact, in the case of a root
> file system, it is manifestly the wrong thing to do.  If we did what
> you suggested, then the system would be trapped in a reboot loop
> forever.
>
> 							- Ted

I am still fuzzy on the use case here.

In any shared ext* file system (pacemaker or other), you have some basic rules:

* you cannot have the file system mounted on more than one node
* failover must fence out any other nodes before starting recovery
* failover (once the node is assured that it is uniquely mounting the file 
system) must do any recovery required to clean up the state

Using ext* (or xfs) in an active/passive cluster with fail over rules that 
follow the above is really common today.

I don't see what the use case here is - are we trying to pretend that pacemaker 
+ ext* allows us to have a single, shared file system in a cluster mounted on 
multiple nodes?

Why not use ocfs2 or gfs2 for that?

Thanks!

Ric

--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html