linux-ext4 - Re: Filesystem corruption after unreachable storage

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20200309151838.GA4852@mit.edu>
Date:   Mon, 9 Mar 2020 11:18:38 -0400
From:   "Theodore Y. Ts'o" <tytso@....edu>
To:     Jean-Louis Dupond <jean-louis@...ond.be>
Cc:     linux-ext4@...r.kernel.org
Subject: Re: Filesystem corruption after unreachable storage

On Mon, Mar 09, 2020 at 02:52:38PM +0100, Jean-Louis Dupond wrote:
> Did some more tests today.
> 
> Setting the SCSi timeout higher seems to be the most reliable solution.
> When the storage recovers, the VM just recovers and we can continue :)
> 
> Also did test setting the filesystem option 'error=panic'.
> When the storage recovers, the VM freezes. So a hard reset is needed. But on
> boot a manual fsck is also needed like in the default situation.
> So it seems like it still writes data to the FS before doing the panic?
> You would expect it to not touch the fs anymore.
> 
> Would be nice if this situation could be a bit more error-proof :)

Did the panic happen immediately, or did things hang until the storage
recovered, and *then* it rebooted.  Or did the hard reset and reboot
happened before the storage network connection was restored?

Fundamentally I think what's going on is that even though there is an
I/O error reported back to the OS, but in some cases, the outstanding
I/O actually happens.  So in the error=panic case, we do update the
superblock saying that the file system contains inconsistencies.  And
then we reboot.  But it appears that even though host rebooted, the
storage area network *did* manage to send the I/O to the device.

I'm not sure what we can really do here, other than simply making the
SCSI timeout infinite.  The problem is that storage area networks are
flaky.  Sometimes I/O's make it through, and even though we get an
error, it's an error from the local SCSI layer --- and it's possible
that I/O will make it through.  In other cases, even though the
storage area network was disconnected at the time we sent the I/O
saying the file system has problems, and then rebooted, the I/O
actually makes it through.  Given that, assuming that if we're not
sure, forcing an full file system check is better part of valor.

And if it hangs forever, and we do a hard reset reboot, I don't know
*what* to trust from the storage area network.  Ideally, there would
be some way to do a hard reset of the storage area network so that all
outstanding I/O's from the host that we are about to reset will get
forgotten before we do actually the hard reset.

						- Ted