lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Fri, 22 Oct 2010 20:54:41 +0200
From:	Bernd Schubert <bs_lists@...ef.fastmail.fm>
To:	"Ted Ts'o" <tytso@....edu>
Cc:	linux-ext4@...r.kernel.org, Bernd Schubert <bschubert@....com>
Subject: Re: ext4_clear_journal_err: Filesystem error recorded from previous mount: IO failure

On Friday, October 22, 2010, Ted Ts'o wrote:
> On Fri, Oct 22, 2010 at 07:42:49PM +0200, Bernd Schubert wrote:
> > No, it is far more difficult than that. The devices are managed by
> > pacemaker.  Which means: I/O errors come up -> Lustre complains
> > about that in its proc file. Pacemaker monitoring fails, so
> > pacemaker stops the device and starts it again.
> 
> I'm not sure what errors you're referring to, but if the errors are

There are multiple ways to let Lustre tell you that there is problem. 
Underlying filesystem related is just one of many.

> related to file system inconsistencies, by definition umounting and
> re-mounting isn't going to fix things, and could result in more
> damage.  For certain errors, you really do need to run e2fsck before
> remounting the device.

Yes and that is exactly why I'm asking for another mount option to not allow 
mounts when the filesystem knows better.

> 
> Can you not change pacemaker to stop the device, run e2fsck, and then
> remount the file system?

I am sure I could spend the next 4 weeks to write code that would allow to do 
that with Lustre and pacemaker. But at the same time, it seems far more easy 
to add another mount flag to ext4...

I also cannot simply set a max_failcount=1 in pacemaker, at that would 
completely be against an HA concept. There are so many ways to increase the 
failcount, for example Lustre bugs (ext4 unrelated), pacemaker bugs, human 
errors (something missing on one node, but available on another), etc. A few 
failures (ext4 unrelated) are absolutely 'normal' over a couple of month and 
there is no reason not to allow that.

I'm not asking you to implement another feature, but I'm asking if a patch to 
add a new option would be accepted. I also cannot promise to implement that 
any time soon, given that I will leave DDN end of November. But it seems to be 
option useful for everyone including my desktop. So either I do that over the 
next 4 weeks when I find a minute or during x-mas or so.

Thanks,
Bernd

-- 
Bernd Schubert
DataDirect Networks
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ