[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <4CC43AC9.8000409@redhat.com>
Date: Sun, 24 Oct 2010 09:55:21 -0400
From: Ric Wheeler <rwheeler@...hat.com>
To: "Ted Ts'o" <tytso@....edu>
CC: Amir Goldstein <amir73il@...il.com>,
Bernd Schubert <bs_lists@...ef.fastmail.fm>,
linux-ext4@...r.kernel.org, Bernd Schubert <bschubert@....com>
Subject: Re: ext4_clear_journal_err: Filesystem error recorded from previous
mount: IO failure
On 10/23/2010 06:17 PM, Ted Ts'o wrote:
> On Sat, Oct 23, 2010 at 06:00:05PM +0200, Amir Goldstein wrote:
>> IMHO, and I've said it before, the mount flag which Bernd requests
>> already exists, namely 'errors=', both as mount option and as
>> persistent default, but it is not enforced correctly on mount time.
>> If an administrator decides that the correct behavior when error is
>> detected is abort or remount-ro, what's the sense it letting the
>> filesystem mount read-write without fixing the problem?
> Again, consider the case of the root filesystem containing an error.
> When the error is first discovered during the source of the system's
> operation, and it's set to errors=panic, you want to immediately
> reboot the system. But then, when root file system is mounted, it
> would be bad to have the system immediately panic again. Instead,
> what you want to have happen is to allow e2fsck to run, correct the
> file system errors, and then system can go back to normal operation.
>
> So the current behavior was deliberately designed to be the way that
> it is, and the difference is between "what do you do when you come
> across a file system error", which is what the errors= mount option is
> all about, and "this file system has some kind of error associated
> with it". Just because it has an error associated with it does not
> mean that immediately rebooting is the right thing to do, even if the
> file system is set to "errors=panic". In fact, in the case of a root
> file system, it is manifestly the wrong thing to do. If we did what
> you suggested, then the system would be trapped in a reboot loop
> forever.
>
> - Ted
I am still fuzzy on the use case here.
In any shared ext* file system (pacemaker or other), you have some basic rules:
* you cannot have the file system mounted on more than one node
* failover must fence out any other nodes before starting recovery
* failover (once the node is assured that it is uniquely mounting the file
system) must do any recovery required to clean up the state
Using ext* (or xfs) in an active/passive cluster with fail over rules that
follow the above is really common today.
I don't see what the use case here is - are we trying to pretend that pacemaker
+ ext* allows us to have a single, shared file system in a cluster mounted on
multiple nodes?
Why not use ocfs2 or gfs2 for that?
Thanks!
Ric
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists