linux-ext4 - Re: Filesystem corruption after unreachable storage

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20200220155022.GA532518@mit.edu>
Date:   Thu, 20 Feb 2020 10:50:22 -0500
From:   "Theodore Y. Ts'o" <tytso@....edu>
To:     Jean-Louis Dupond <jean-louis@...ond.be>
Cc:     linux-ext4@...r.kernel.org
Subject: Re: Filesystem corruption after unreachable storage

On Thu, Feb 20, 2020 at 10:08:44AM +0100, Jean-Louis Dupond wrote:
> dumpe2fs -> see attachment

Looking at the dumpe2fs output, it's interesting that it was "clean
with errors", without any error information being logged in the
superblock.  What version of the kernel are you using?  I'm guessing
it's a fairly old one?

> Fsck:
> # e2fsck -fy /dev/mapper/vg01-root
> e2fsck 1.44.5 (15-Dec-2018)

And that's a old version of e2fsck as well.  Is this some kind of
stable/enterprise linux distro?

> Pass 1: Checking inodes, blocks, and sizes
> Inodes that were part of a corrupted orphan linked list found.  Fix? yes
> 
> Inode 165708 was part of the orphaned inode list.  FIXED.

OK, this and the rest looks like it's relating to a file truncation or
deletion at the time of the disconnection.

 > > > On KVM for example there is a unlimited timeout (afaik) until the
> > > storage is
> > > back, and the VM just continues running after storage recovery.
> > Well, you can adjust the SCSI timeout, if you want to give that a try....
> It has some other disadvantages? Or is it quite safe to increment the SCSI
> timeout?

It should be pretty safe.

Can you reliably reproduce the problem by disconnecting the machine
from the SAN?

						- Ted