[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <3a7bc899-31d9-51f2-1ea9-b3bef2a98913@dupond.be>
Date: Thu, 20 Feb 2020 10:08:44 +0100
From: Jean-Louis Dupond <jean-louis@...ond.be>
To: "Theodore Y. Ts'o" <tytso@....edu>
Cc: linux-ext4@...r.kernel.org
Subject: Re: Filesystem corruption after unreachable storage
As the mail seems to have been trashed somewhere, I'll retry :)
Thanks
Jean-Louis
On 24/01/2020 21:37, Theodore Y. Ts'o wrote:
> On Fri, Jan 24, 2020 at 11:57:10AM +0100, Jean-Louis Dupond wrote:
>> There was a short disruption of the SAN, which caused it to be
>> unavailable
>> for 20-25 minutes for the ESXi.
> 20-25 minutes is "short"? I guess it depends on your definition / POV. :-)
Well more downtime was caused to recover (due to manual fsck) then the
time the storage was down :)
>
>> What worries me is that almost all of the VM's (out of 500) were
>> showing the
>> same error.
> So that's a bit surprising...
Indeed, that's were I thought, something went wrong here!
I've tried to simulate it, and were able to simulate the same error when
we let the san recover BEFORE VM is shutdown.
If I poweroff the VM and then recover the SAN, it does an automatic fsck
without problems.
So it really seems it breaks when the VM can write again to the SAN.
>
>> And even some (+-10) were completely corrupt.
> What do you mean by "completely corrupt"? Can you send an e2fsck
> transcript of file systems that were "completely corrupt"?
Well it was moving a tons of files to lost+found etc. So that was really
broken.
I'll see if I can recover some backup of one in broken state.
Anyway this was only a very small percentage, so worries me less then
the rest :)
>
>> Is there for example a chance that the filesystem gets corrupted the
>> moment
>> the SAN storage was back accessible?
> Hmm... the one possibility I can think of off the top of my head is
> that in order to mark the file system as containing an error, we need
> to write to the superblock. The head of the linked list of orphan
> inodes is also in the superblock. If that had gotten modified in the
> intervening 20-25 minutes, it's possible that this would result in
> orphaned inodes not on the linked list, causing that error.
>
> It doesn't explain the more severe cases of corruption, though.
If fixing that would have left us with only 10 corrupt disks instead of
500, would be a big win :)
>
>> I also have some snapshot available of a corrupted disk if some
>> additional
>> debugging info is required.
> Before e2fsck was run? Can you send me a copy of the output of
> dumpe2fs run on that disk, and then transcript of e2fsck -fy run on a
> copy of that snapshot?
Sure:
dumpe2fs -> see attachment
Fsck:
# e2fsck -fy /dev/mapper/vg01-root
e2fsck 1.44.5 (15-Dec-2018)
Pass 1: Checking inodes, blocks, and sizes
Inodes that were part of a corrupted orphan linked list found. Fix? yes
Inode 165708 was part of the orphaned inode list. FIXED.
Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Pass 5: Checking group summary information
Block bitmap differences: -(863328--863355)
Fix? yes
Free blocks count wrong for group #26 (3485, counted=3513).
Fix? yes
Free blocks count wrong (1151169, counted=1151144).
Fix? yes
Inode bitmap differences: -4401 -165708
Fix? yes
Free inodes count wrong for group #0 (2489, counted=2490).
Fix? yes
Free inodes count wrong for group #20 (1298, counted=1299).
Fix? yes
Free inodes count wrong (395115, counted=395098).
Fix? yes
/dev/mapper/vg01-root: ***** FILE SYSTEM WAS MODIFIED *****
/dev/mapper/vg01-root: 113942/509040 files (0.2% non-contiguous),
882520/2033664 blocks
>
>> It would be great to gather some feedback on how to improve the situation
>> (next to of course have no SAN outage :)).
> Something that you could consider is setting up your system to trigger
> a panic/reboot on a hung task timeout, or when ext4 detects an error
> (see the man page of tune2fs and mke2fs and the -e option for those
> programs).
>
> There are tradeoffs with this, but if you've lost the SAN for 15-30
> minutes, the file systems are going to need to be checked anyway, and
> the machine will certainly not be serving. So forcing a reboot might
> be the best thing to do.
Going to look into that! Thanks for the info.
>> On KVM for example there is a unlimited timeout (afaik) until the
>> storage is
>> back, and the VM just continues running after storage recovery.
> Well, you can adjust the SCSI timeout, if you want to give that a try....
It has some other disadvantages? Or is it quite safe to increment the
SCSI timeout?
>
> Cheers,
>
> - Ted
Powered by blists - more mailing lists