[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20180523154830.v7auwdy5xjhk7ili@quack2.suse.cz>
Date: Wed, 23 May 2018 17:48:30 +0200
From: Jan Kara <jack@...e.cz>
To: Jaco Kroon <jaco@....co.za>
Cc: Jan Kara <jack@...e.cz>, linux-ext4 <linux-ext4@...r.kernel.org>,
Pieter Kruger <pieter@....co.za>
Subject: Re: corrupt filesystem, superblock/journal - fsck
Hi!
On Wed 23-05-18 16:46:25, Jaco Kroon wrote:
> So I tracked down the fsck issue ... with a bit of additional debug
> output in the lib/ext2fs/openfs.c file, the if statement that is failing
> is this one:
>
> if (fs->group_desc_count * EXT2_INODES_PER_GROUP(fs->super) !=
> fs->super->s_inodes_count) {
> fprintf(stderr, "\ngroup to inodes problem ...
> group_desc_count=%u, inodes/group=%u, inode_count=%u\n\n",
> fs->group_desc_count, EXT2_INODES_PER_GROUP(fs->super),
> fs->super->s_inodes_count);
> retval = EXT2_ET_CORRUPT_SUPERBLOCK;
> goto cleanup;
> }
>
> And this gives us:
>
> group to inodes problem ... group_desc_count=524288, inodes/group=8192,
> inode_count=4294967295
>
> 524288 * 8192 = 4294967296
>
> As a result, and value other than 0 in the inode_count file will result
> in fsck refusing to fsck the filesystem, however, mounting with that bit
> of corruption does in fact work, so whilst fsck will not function at the
> moment at least the filesystem is mounted, but this will need to be
> sorted out somehow.
>
> I suspect this boils down to two things:
>
> 1. The kernel (as well as offline resize) needs to prevent resizes
> pushing inode count >= 2^32 (or if it hits exactly that just limit to
> 2^32-1).
> 2. fsck needs to be made aware of this.
>
> I've now used debugfs to set the inode count to 2^32-1, and the kernel
> is quite happy with this, but none of the userspace tools will currently
> operate on the filesystem.
Yep, it seems we have quite a few things to fix. Thanks for the detailed
debugging :) I'll try to cook up some patches tomorrow.
> Thank you so much for helping me to get the filesystem online again.
> The tools and kernel will need to be fixed however in order to ensure
> that there are not going to be problems going forward.
Well, you've debugged most of the problems yourself! So most of the credit
goes to you. It's good to hear that your fs is up and running again.
Honza
> On 23/05/2018 13:37, Jan Kara wrote:
> > Hi,
> >
> > On Mon 21-05-18 14:21:33, Jaco Kroon wrote:
> >> We had a host starting to fail processing on an ext4 filesystem directly
> >> after extend from 60.5TB to 64TB (lvresize -L64T /dev/lvm/home,
> >> resize2fs /dev/lvm/home).
> >>
> >> We rebooted, and now the filesystem will mount but the problem
> >> persists. We've now umounted the filesystem, and fsck complains as follows:
> >>
> >> crowsnest ~ # fsck.ext4 -f /dev/lvm/home
> >> e2fsck 1.43.6 (29-Aug-2017)
> >> Superblock has an invalid journal (inode 8).
> >> Clear<y>? yes
> >> *** journal has been deleted ***
> >>
> >> Corruption found in superblock. (inodes_count = 0).
> >>
> >> The superblock could not be read or does not describe a valid ext2/ext3/ext4
> >> filesystem. If the device is valid and it really contains an ext2/ext3/ext4
> >> filesystem (and not swap or ufs or something else), then the superblock
> >> is corrupt, and you might try running e2fsck with an alternate superblock:
> >> e2fsck -b 8193 <device>
> >> or
> >> e2fsck -b 32768 <device>
> >>
> >> Corruption found in superblock. (first_ino = 11).
> >>
> >> The superblock could not be read or does not describe a valid ext2/ext3/ext4
> >> filesystem. If the device is valid and it really contains an ext2/ext3/ext4
> >> filesystem (and not swap or ufs or something else), then the superblock
> >> is corrupt, and you might try running e2fsck with an alternate superblock:
> >> e2fsck -b 8193 <device>
> >> or
> >> e2fsck -b 32768 <device>
> >>
> >> Inode count in superblock is 0, should be 4294967295.
> >> Fix<y>? yes
> >>
> >> /dev/lvm/home: ***** FILE SYSTEM WAS MODIFIED *****
> > OK, so the Inode count is obviously wrong and the remaining errors are due
> > to that. Apparently the resize process has overflown the inode count to 0
> > (which is not that surprising since the number of inodes in your filesystem
> > would be 1<<32) - that needs fixing but let's first get your fs up and
> > running. I'm actually surprised that e2fsck did anything with the
> > filesystem because for me both 1.44.2 and 1.42.11 versions just exit after
> > printing the error about the corrupted superblock. Anyway what *could* fix
> > your problem is:
> >
> > debugfs -w -R 'ssv inodes_count 4294967295' /dev/lvm/home
> >
> > and then check with dumpe2fs that inode count indeed got fixed. Hope it
> > helps.
> >
> > Honza
> >
>
--
Jan Kara <jack@...e.com>
SUSE Labs, CR
Powered by blists - more mailing lists