lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Thu, 20 Feb 2020 10:50:22 -0500
From:   "Theodore Y. Ts'o" <tytso@....edu>
To:     Jean-Louis Dupond <jean-louis@...ond.be>
Cc:     linux-ext4@...r.kernel.org
Subject: Re: Filesystem corruption after unreachable storage

On Thu, Feb 20, 2020 at 10:08:44AM +0100, Jean-Louis Dupond wrote:
> dumpe2fs -> see attachment

Looking at the dumpe2fs output, it's interesting that it was "clean
with errors", without any error information being logged in the
superblock.  What version of the kernel are you using?  I'm guessing
it's a fairly old one?

> Fsck:
> # e2fsck -fy /dev/mapper/vg01-root
> e2fsck 1.44.5 (15-Dec-2018)

And that's a old version of e2fsck as well.  Is this some kind of
stable/enterprise linux distro?

> Pass 1: Checking inodes, blocks, and sizes
> Inodes that were part of a corrupted orphan linked list found.  Fix? yes
> 
> Inode 165708 was part of the orphaned inode list.  FIXED.

OK, this and the rest looks like it's relating to a file truncation or
deletion at the time of the disconnection.

 > > > On KVM for example there is a unlimited timeout (afaik) until the
> > > storage is
> > > back, and the VM just continues running after storage recovery.
> > Well, you can adjust the SCSI timeout, if you want to give that a try....
> It has some other disadvantages? Or is it quite safe to increment the SCSI
> timeout?

It should be pretty safe.

Can you reliably reproduce the problem by disconnecting the machine
from the SAN?

						- Ted

Powered by blists - more mailing lists