linux-ext4 - corrupt filesystem, superblock/journal

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <ac6666bb-dc52-8cc1-5107-103ebf9273e1@uls.co.za>
Date:   Mon, 21 May 2018 14:21:33 +0200
From:   Jaco Kroon <jaco@....co.za>
To:     linux-ext4 <linux-ext4@...r.kernel.org>
Cc:     Pieter Kruger <pieter@....co.za>
Subject: corrupt filesystem, superblock/journal - fsck

Hi All,

We had a host starting to fail processing on an ext4 filesystem directly
after extend from 60.5TB to 64TB (lvresize -L64T /dev/lvm/home,
resize2fs /dev/lvm/home).

We rebooted, and now the filesystem will mount but the problem
persists.  We've now umounted the filesystem, and fsck complains as follows:

crowsnest ~ # fsck.ext4 -f /dev/lvm/home
e2fsck 1.43.6 (29-Aug-2017)
Superblock has an invalid journal (inode 8).
Clear<y>? yes
*** journal has been deleted ***

Corruption found in superblock.  (inodes_count = 0).

The superblock could not be read or does not describe a valid ext2/ext3/ext4
filesystem.  If the device is valid and it really contains an ext2/ext3/ext4
filesystem (and not swap or ufs or something else), then the superblock
is corrupt, and you might try running e2fsck with an alternate superblock:
    e2fsck -b 8193 <device>
 or
    e2fsck -b 32768 <device>

Corruption found in superblock.  (first_ino = 11).

The superblock could not be read or does not describe a valid ext2/ext3/ext4
filesystem.  If the device is valid and it really contains an ext2/ext3/ext4
filesystem (and not swap or ufs or something else), then the superblock
is corrupt, and you might try running e2fsck with an alternate superblock:
    e2fsck -b 8193 <device>
 or
    e2fsck -b 32768 <device>

Inode count in superblock is 0, should be 4294967295.
Fix<y>? yes

/dev/lvm/home: ***** FILE SYSTEM WAS MODIFIED *****

Note that in spite of supposedly fixing the errors if we re-run fsck the
exact errors repeat.  Even if we use -b 32768 (4K blocks) it still repeats.

crowsnest ~ # dumpe2fs -f /dev/lvm/home
dumpe2fs 1.43.6 (29-Aug-2017)
Filesystem volume name:   <none>
Last mounted on:          /home
Filesystem UUID:          9f0e94bc-25b7-44b7-afdd-bfaa90dbf25c
Filesystem magic number:  0xEF53
Filesystem revision #:    1 (dynamic)
Filesystem features:      has_journal ext_attr dir_index filetype
meta_bg extent 64bit flex_bg sparse_super large_file huge_file uninit_bg
dir_nlink extra_isize
Filesystem flags:         signed_directory_hash
Default mount options:    user_xattr acl
Filesystem state:         clean with errors
Errors behavior:          Continue
Filesystem OS type:       Linux
Inode count:              0
Block count:              17179869184
Reserved block count:     0
Free blocks:              997705247
Free inodes:              4176704883
First block:              0
Block size:               4096
Fragment size:            4096
Group descriptor size:    64
Blocks per group:         32768
Fragments per group:      32768
Inodes per group:         8192
Inode blocks per group:   512
RAID stride:              128
RAID stripe width:        512
First meta block group:   1472
Flex block group size:    16
Filesystem created:       Tue Feb 10 12:51:13 2015
Last mount time:          Mon May 21 13:44:11 2018
Last write time:          Mon May 21 13:44:27 2018
Mount count:              34
Maximum mount count:      -1
Last checked:             Sat Aug 19 19:16:38 2017
Check interval:           0 (<none>)
Lifetime writes:          86 TB
Reserved blocks uid:      0 (user root)
Reserved blocks gid:      0 (group root)
First inode:              11
Inode size:               256
Required extra isize:     28
Desired extra isize:      28
Journal inode:            8
Default directory hash:   half_md4
Directory Hash Seed:      e4a234f2-63e8-4bdd-9591-ba54453485cc
Journal backup:           inode blocks
FS Error count:           30230
First error time:         Mon May 21 12:47:00 2018
First error function:     ext4_search_dir
First error line #:       1296
First error inode #:      304881794
First error block #:      1219511409
Last error time:          Mon May 21 13:44:19 2018
Last error function:      htree_dirblock_to_tree
Last error line #:        1006
Last error inode #:       2
Last error block #:       9697
dumpe2fs: Illegal inode number while reading journal inode

The filesystem does mount but dmesg gives:

[ 3112.080745] EXT4-fs (dm-5): warning: mounting fs with errors, running
e2fsck is recommended
[ 3112.646029] EXT4-fs (dm-5): mounted filesystem with ordered data
mode. Opts: delalloc,inode_readahead_blks=4096
[ 3120.230898] EXT4-fs error (device dm-5): htree_dirblock_to_tree:1006:
inode #2: block 9697: comm ls: bad entry in directory: inode out of
bounds - offset=0(0), inode=2, rec_len=12, name_len=1

The last line is as soon as we run ls on the mount point (which ends up
giving an empty, not even . or .., listing).  ls doesn't give nor does
strace show any errors coming back on any system calls:

stat("/home/", {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0
open("/home/", O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 3
fstat(3, {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0
getdents(3, /* 0 entries */, 32768)     = 0
close(3)                                = 0
close(1)                                = 0
close(2)                                = 0

Interestingly stat does work:

crowsnest ~ # stat /home
  File: /home
  Size: 4096            Blocks: 8          IO Block: 4096   directory
Device: fd05h/64773d    Inode: 2           Links: 5
Access: (0755/drwxr-xr-x)  Uid: (    0/    root)   Gid: (    0/    root)
Access: 2015-03-28 17:36:47.489839898 +0200
Modify: 2016-12-01 16:52:49.162294603 +0200
Change: 2016-12-01 16:52:49.162294603 +0200
 Birth: -

I'm unsure if the filesystem resize completed properly before reboot
.... and also keeping in mind this is LVM on top of software raid (2 x
RAID6 arays consisting of 10 and 12 x 4 TB disks respectively), on top
of an mpt3sas controller.  Kernel 4.16 with the patch at
https://www.mail-archive.com/linux-scsi@vger.kernel.org/msg72812.html
applied (which prevents the system from locking up every few days).  As
far as I can determine this patch series has not been mainlined anywhere
in 4.16.*.

Kind Regards,
Jaco