linux-ext4 - Re: corrupt filesystem, superblock/journal

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <C15B5FD1-FAF4-44CA-840B-AD90ADAACFB2@dilger.ca>
Date:   Wed, 23 May 2018 11:06:11 -0600
From:   Andreas Dilger <adilger@...ger.ca>
To:     Jaco Kroon <jaco@....co.za>
Cc:     Jan Kara <jack@...e.cz>, linux-ext4 <linux-ext4@...r.kernel.org>,
        Pieter Kruger <pieter@....co.za>
Subject: Re: corrupt filesystem, superblock/journal - fsck

On May 23, 2018, at 8:46 AM, Jaco Kroon <jaco@....co.za> wrote:
> 
> Hi,
> 
> So I tracked down the fsck issue ... with a bit of additional debug
> output in the lib/ext2fs/openfs.c file, the if statement that is failing
> is this one:
> 
>     if (fs->group_desc_count * EXT2_INODES_PER_GROUP(fs->super) !=
>         fs->super->s_inodes_count) {
>         fprintf(stderr, "\ngroup to inodes problem ...
> group_desc_count=%u, inodes/group=%u, inode_count=%u\n\n",
> fs->group_desc_count, EXT2_INODES_PER_GROUP(fs->super),
> fs->super->s_inodes_count);
>         retval = EXT2_ET_CORRUPT_SUPERBLOCK;
>         goto cleanup;
>     }
> 
> And this gives us:
> 
> group to inodes problem ... group_desc_count=524288, inodes/group=8192,
> inode_count=4294967295
> 
> 524288 * 8192 = 4294967296
> 
> As a result, and value other than 0 in the inode_count file will result
> in fsck refusing to fsck the filesystem, however, mounting with that bit
> of corruption does in fact work, so whilst fsck will not function at the
> moment at least the filesystem is mounted, but this will need to be
> sorted out somehow.
> 
> I suspect this boils down to two things:
> 
> 1.  The kernel (as well as offline resize) needs to prevent resizes
> pushing inode count >= 2^32 (or if it hits exactly that just limit to
> 2^32-1).
> 2.  fsck needs to be made aware of this.
> 
> I've now used debugfs to set the inode count to 2^32-1, and the kernel
> is quite happy with this, but none of the userspace tools will currently
> operate on the filesystem.

You may be able to "painlessly" recover from this if you act quickly
(since your filesystem is mounted and in use).

If you use debugfs to shrink the total blocks count and the total inodes
count in the superblock by one full group, then it may be that e2fsck can
open the filesystem and repair it.

The "acting quickly" part is because you want to do this before you start
getting files allocated in the last block group that pushes the filesystem
to 2^32 inodes.  You can check this via "dumep2fs | tail -20" to see if
the blocks/inodes in that last group are allocated or not.  If they are,
you can either copy those files out of the filesystem temporarily, or try
to copy them to some other part of the fs (/sys/fs/ext4/<dev>/inode_goal
is your friend here).

Cheers, Andreas

> Thank you so much for helping me to get the filesystem online again.
> The tools and kernel will need to be fixed however in order to ensure
> that there are not going to be problems going forward.
> 
> Kind Regards,
> Jaco
> 
> 
> On 23/05/2018 13:37, Jan Kara wrote:
>> Hi,
>> 
>> On Mon 21-05-18 14:21:33, Jaco Kroon wrote:
>>> We had a host starting to fail processing on an ext4 filesystem directly
>>> after extend from 60.5TB to 64TB (lvresize -L64T /dev/lvm/home,
>>> resize2fs /dev/lvm/home).
>>> 
>>> We rebooted, and now the filesystem will mount but the problem
>>> persists.  We've now umounted the filesystem, and fsck complains as follows:
>>> 
>>> crowsnest ~ # fsck.ext4 -f /dev/lvm/home
>>> e2fsck 1.43.6 (29-Aug-2017)
>>> Superblock has an invalid journal (inode 8).
>>> Clear<y>? yes
>>> *** journal has been deleted ***
>>> 
>>> Corruption found in superblock.  (inodes_count = 0).
>>> 
>>> The superblock could not be read or does not describe a valid ext2/ext3/ext4
>>> filesystem.  If the device is valid and it really contains an ext2/ext3/ext4
>>> filesystem (and not swap or ufs or something else), then the superblock
>>> is corrupt, and you might try running e2fsck with an alternate superblock:
>>>     e2fsck -b 8193 <device>
>>>  or
>>>     e2fsck -b 32768 <device>
>>> 
>>> Corruption found in superblock.  (first_ino = 11).
>>> 
>>> The superblock could not be read or does not describe a valid ext2/ext3/ext4
>>> filesystem.  If the device is valid and it really contains an ext2/ext3/ext4
>>> filesystem (and not swap or ufs or something else), then the superblock
>>> is corrupt, and you might try running e2fsck with an alternate superblock:
>>>     e2fsck -b 8193 <device>
>>>  or
>>>     e2fsck -b 32768 <device>
>>> 
>>> Inode count in superblock is 0, should be 4294967295.
>>> Fix<y>? yes
>>> 
>>> /dev/lvm/home: ***** FILE SYSTEM WAS MODIFIED *****
>> OK, so the Inode count is obviously wrong and the remaining errors are due
>> to that. Apparently the resize process has overflown the inode count to 0
>> (which is not that surprising since the number of inodes in your filesystem
>> would be 1<<32) - that needs fixing but let's first get your fs up and
>> running. I'm actually surprised that e2fsck did anything with the
>> filesystem because for me both 1.44.2 and 1.42.11 versions just exit after
>> printing the error about the corrupted superblock. Anyway what *could* fix
>> your problem is:
>> 
>> debugfs -w -R 'ssv inodes_count 4294967295' /dev/lvm/home
>> 
>> and then check with dumpe2fs that inode count indeed got fixed. Hope it
>> helps.
>> 
>> 								Honza
>> 
> 


Cheers, Andreas






Download attachment "signature.asc" of type "application/pgp-signature" (874 bytes)