lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Sun, 24 Oct 2010 11:49:36 -0400
From:	Ric Wheeler <rwheeler@...hat.com>
To:	Bernd Schubert <bschubert@....com>
CC:	Ric Wheeler <rwheeler@...hat.com>, "Ted Ts'o" <tytso@....edu>,
	Amir Goldstein <amir73il@...il.com>,
	Bernd Schubert <bs_lists@...ef.fastmail.fm>,
	"linux-ext4@...r.kernel.org" <linux-ext4@...r.kernel.org>,
	Andreas Dilger <adilger@....com>
Subject: Re: ext4_clear_journal_err: Filesystem error recorded from previous
 mount: IO failure

  On 10/24/2010 11:39 AM, Bernd Schubert wrote:
> On 10/24/2010 05:20 PM, Ric Wheeler wrote:
>> This still sounds more like a Lustre issue than an ext4 one, Andreas can fill in
>> the technical details.
> The underlying device handling is unrelated to Lustre. In that sense it
> is just a local filesystem.
>
>> What ever shared storage sits under ext4 is irrelevant to the fail over case.
>>
>> Unless Lustre does other magic, they still need to obey the basic cluster rules
>> - one mount per cluster.
> Yes, one mount per cluster.
>
>> If Lustre is doing the same trick you would do with active/passive failure over
>> clusters that export ext4 via NFS, you would still need to clean up the file
>> system before being able to re-export it from a fail over node.
> What exactly is your question here? We use pacemaker/stonith to do the
> fencing job.
> What exactly do you want to clean up? The device is recovered by
> journals, Lustre goes into recovery mode, clients reconnect, locks are
> updated and incomplete transactions resend.
>
>
> Cheers,
> Bernd
>

What I don't get (certainly might just be me) is why this is a unique issue when 
used by lustre. Normally, any similar type of fail over will clean up the local 
file system normally before trying to re-export from the second node.

Why exactly can't you use the same type of recovery here? Is it the fencing 
agent killing nodes on detection of the file system errors?

Ric

--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ