lists.openwall.net | lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC | |
Open Source and information security mailing list archives
| ||
|
Message-ID: <55CA3BDE.7020401@harvyl.se> Date: Tue, 11 Aug 2015 20:15:58 +0200 From: Johan Harvyl <johan@...vyl.se> To: linux-ext4@...r.kernel.org Subject: resize2fs: Should never happen: resize inode corrupt! - lost key inodes Hi, I recently attempted an operation I have done many many times before, add a drive to a raid array followed by offline resize2fs to expand the ext4fs on it. This time however it failed miserably and key parts of the filesystem appear so corrupt that it can no longer be mounted. Here is what triggered all this: # umount /dev/md0 # fsck.ext4 -f /dev/md0 # resize2fs /dev/md0 Should never happen: resize inode corrupt! It looks to me like there is some sanity check missing in resize2fs, and I would like to figure out what. Scanning through the linux-ext4 archives a bit I found the "64bit + resize2fs... this is Not Good" thread: http://www.spinics.net/lists/linux-ext4/msg35039.html His problem looks somewhat similar to mine although I do not see the same possible root cause. Googling I also find a few threads like: http://www.spinics.net/lists/linux-ext4/msg27511.html That suggests it would not be possible to resize a 64bit fs with resize_inode and flex_bg, but those threads are old and resize2fs 1.42.13 (my version) did not articulate that combination being a problem. Any input on what resize2fs has actually done and suggestions on what to try to recover would be greatly appreciated. The md array has been re-started read-only and will remain so for the time being, I want a clear understanding of what has actually happened before I try something possibly destructive (like disabling the journal and running e2fsck -f).To be honest part of me enjoy getting my hands dirty digging through the filesystem internals and there are backups of the important stuff but still there are some data I would like to recover. What I would like is something along the lines of a read-only fsck that lets me work with the fixed-up fs without actually modifying the underlying block device as I do not quite trust e2fsprogs to make further changes to that filesystem. The best I have found so far is UFS explorer, which looks promising. It does find a lot of the files and has options to copy entire directories onto another filesystem but I have no way of knowing that the contents in the files are actually intact so it or may not be worth spending money on. I will now try to go through a bit of what I have tried and found so far. For reference here is the md reshape. At the end of this post there will be some further history on how the md and ext4fs was created and expanded: # mdadm --add /dev/md0 /dev/sdr mdadm: added /dev/sdr # mdadm --grow /dev/md0 --raid-devices=8 [119591.811743] md0: detected capacity change from 20003262300160 to 24003914760192 [119592.891563] VFS: busy inodes on changed media or resized disk md0 Attempt at mounting /dev/md0: [146160.561297] EXT4-fs (md0): no journal found Attempt at mounting /dev/md0 with -o ro,noload: [146592.329911] EXT4-fs (md0): get root inode failed [146592.329914] EXT4-fs (md0): mount failed debugfs: stat <2> Inode: 2 Type: bad type Mode: 0000 Flags: 0x0 Generation: 0 Version: 0x00000000 User: 0 Group: 0 Size: 0 File ACL: 0 Directory ACL: 0 Links: 0 Blockcount: 0 Fragment: Address: 0 Number: 0 Size: 0 ctime: 0x00000000 -- Thu Jan 1 01:00:00 1970 atime: 0x00000000 -- Thu Jan 1 01:00:00 1970 mtime: 0x00000000 -- Thu Jan 1 01:00:00 1970 Size of extra inode fields: 0 BLOCKS: debugfs: stat <7> Inode: 7 Type: bad type Mode: 0000 Flags: 0x0 Generation: 0 Version: 0x00000000 User: 0 Group: 0 Size: 0 File ACL: 0 Directory ACL: 0 Links: 0 Blockcount: 0 Fragment: Address: 0 Number: 0 Size: 0 ctime: 0x00000000 -- Thu Jan 1 01:00:00 1970 atime: 0x00000000 -- Thu Jan 1 01:00:00 1970 mtime: 0x00000000 -- Thu Jan 1 01:00:00 1970 Size of extra inode fields: 0 BLOCKS: Manual check of the root inode on the broken filesystem: Group 0: block bitmap at 2881, inode bitmap at 2897, inode table at 2913 4294963995 free blocks, 501 free inodes, 2 used directories, 501 unused inodes [Checksum 0x404c] Clearly the 4294963995 free blocks in a 32768 block group does not make sense. 00001000 41 0B 00 00 51 0B 00 00 61 0B 00 00 1B F3 F5 01 00001010 02 00 04 00 00 00 00 00 00 00 00 00 F5 01 4C 40 00001020 00 00 00 00 00 00 00 00 00 00 00 00 *FF FF*00 00 00001030 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 In [72]: hex(2913 * 4096 + 1 * 256) Out[72]: '0xb61100' 00B61100 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00B61110 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00B61120 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00B61130 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ... 00B61700 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00B61710 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00B61720 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00B61730 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 Uh oh, where did the root inode, and the resize inode go? Just to confirm the math, here is the same thing on a reference clean filesystem: Group 0: block bitmap at 2641, inode bitmap at 2657, inode table at 2673 19 free blocks, 501 free inodes, 2 used directories, 501 unused inodes [Checksum 0x5791] In [42]: hex(2673*4096 + 1*256) Out[42]: '0xa71100' 00A71100 ED 41 00 00 00 10 00 00 D9 D3 BD 55 B7 D3 BD 55 00A71110 B7 D3 BD 55 00 00 00 00 00 00 13 00 08 00 00 00 00A71120 00 00 08 00 23 00 00 00 0A F3 01 00 04 00 00 00 00A71130 00 00 00 00 00 00 00 00 01 00 00 00 EF 5F 00 00 The dirent for / is at 0x5FEF * 4096: 05FEF000 02 00 00 00 0C 00 01 02 2E 00 00 00 02 00 00 00 05FEF010 0C 00 02 02 2E 2E 00 00 0B 00 00 00 14 00 0A 02 05FEF020 6C 6F 73 74 2B 66 6F 75 6E 64 00 00 01 80 46 02 In other words ".", "..", "lost+found" and so on... <END of reference clean file system data> Going back to the broken filesystem again, the root dirent is at: 01DE8000 02 00 00 00 0C 00 01 02 2E 00 00 00 02 00 00 00 01DE8010 0C 00 02 02 2E 2E 00 00 0B 00 00 00 14 00 0A 02 01DE8020 6C 6F 73 74 2B 66 6F 75 6E 64 00 00 0C 40 8C 03 But again where is its inode? I have not been able to find an inode that references that block, at least not in the same way I see on other filesystems. ### Current kernel (stock debian): 4.0.0-2-amd64 #1 SMP Debian 4.0.8-2 (2015-07-22) x86_64 GNU/Linux Current (when failing resize2fs was executed) e2fsprogs version (stock debian): 1.42.13-1 MD and FS information --- /dev/md0: Raid Level : raid6 Array Size : 23441323008 (22355.39 GiB 24003.91 GB) Used Dev Size : 3906887168 (3725.90 GiB 4000.65 GB) Raid Devices : 8 Total Devices : 8 # dumpe2fs -h /dev/md0 dumpe2fs 1.42.13 (17-May-2015) Filesystem volume name: <none> Last mounted on: /mnt/r0 Filesystem UUID: 13c2eb37-e951-4ad1-b194-21f0880556db Filesystem magic number: 0xEF53 Filesystem revision #: 1 (dynamic) Filesystem features: has_journal ext_attr resize_inode dir_index filetype extent 64bit flex_bg sparse_super large_file huge_file un\ init_bg dir_nlink extra_isize Filesystem flags: signed_directory_hash Default mount options: user_xattr acl Filesystem state: clean with errors Errors behavior: Continue Filesystem OS type: Linux Inode count: 91568128 Block count: 5860330752 Reserved block count: 0 Free blocks: 1013128185 Free inodes: 88364147 First block: 0 Block size: 4096 Fragment size: 4096 Group descriptor size: 64 Blocks per group: 32768 Fragments per group: 32768 Inodes per group: 512 Inode blocks per group: 32 RAID stride: 128 RAID stripe width: 512 Flex block group size: 16 Filesystem created: Wed Jun 25 23:22:06 2014 Last mount time: Fri Jul 31 15:35:09 2015 Last write time: Sun Aug 2 08:03:47 2015 Mount count: 0 Maximum mount count: -1 Last checked: Sun Aug 2 07:44:35 2015 Check interval: 0 (<none>) Lifetime writes: 19 TB Reserved blocks uid: 0 (user root) Reserved blocks gid: 0 (group root) First inode: 11 Inode size: 256 Required extra isize: 28 Desired extra isize: 28 Journal inode: 8 Default directory hash: half_md4 Directory Hash Seed: 6bb07dee-8871-4b62-aa92-20080e16cb8c Journal backup: inode blocks Journal superblock magic number invalid! Some possibly relevant pieces from /etc/mke2fs.conf: [defaults] base_features = sparse_super,large_file,filetype,resize_inode,dir_index,ext_attr default_mntopts = acl,user_xattr enable_periodic_fsck = 0 blocksize = 4096 inode_size = 256 inode_ratio = 16384 [fs_types] ext4 = { features = has_journal,extent,huge_file,flex_bg,uninit_bg,dir_nlink,extra_isize auto_64-bit_support = 1 inode_size = 256 } Note that this is what that file looks like right now, I cannot think of a way of telling what it looked like when the filesystem was initially created. What I can come up with is a best guess since another ext4fs on that same machine created around the same time (and therefore likely with the same mke2fs.conf) does not have the resize_inode flag set, which my corrupt fs has. I have no idea how that got enabled on my corrupt fs. ### How the md and ext4fs was created and expanded --- # mdadm --create --verbose --chunk=512 /dev/md0 --level=5 --raid-devices=5 /dev/sdi /dev/sdj /dev/sdk /dev/sdl /dev/sdm mdadm: layout defaults to left-symmetric mdadm: layout defaults to left-symmetric mdadm: /dev/sdm appears to be part of a raid array: level=raid6 devices=8 ctime=Wed Jan 25 23:49:02 2012 mdadm: size set to 3906887168K mdadm: automatically enabling write-intent bitmap on large array Continue creating array? y mdadm: Defaulting to version 1.2 metadata mdadm: array /dev/md0 started. --- # mkfs.ext4 /dev/md0 -i 262144 -m 0 -O 64bit mke2fs 1.42.10 (18-May-2014) Creating filesystem with 3906887168 4k blocks and 61045248 inodes Filesystem UUID: 13c2eb37-e951-4ad1-b194-21f0880556db Superblock backups stored on blocks: 32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632, 2654208, 4096000, 7962624, 11239424, 20480000, 23887872, 71663616, 78675968, 102400000, 214990848, 512000000, 550731776, 644972544, 1934917632, 2560000000, 3855122432 Allocating group tables: done Writing inode tables: done Creating journal (32768 blocks): done Writing superblocks and filesystem accounting information: done --- # mdadm --add /dev/md0 /dev/sdo mdadm: added /dev/sdo # mdadm --grow /dev/md0 --level=6 --raid-devices=6 --backup-file=/mnt/md100/md0_backup mdadm: level of /dev/md0 changed to raid6 --- # mdadm --add /dev/md0 /dev/sdq mdadm: added /dev/sdq # mdadm --grow /dev/md0 --raid-devices=7 --- # umount /dev/md0 # fsck.ext4 -f /dev/md0 # resize2fs /dev/md0 -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@...r.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists