lists.openwall.net | lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening PHC | |
Open Source and information security mailing list archives
| ||
|
Date: Sat, 16 Apr 2022 00:37:29 +1000 From: Peter Urbanec <linux-ext4.vger.kernel.org@...anec.net> To: linux-ext4@...r.kernel.org Subject: resize2fs on ext4 leads to corruption I think I may have run into a resize2fs bug that leads to data loss. I see this: # mount -t ext4 /dev/md0 /mnt/RED8 mount: /mnt/RED8: wrong fs type, bad option, bad superblock on /dev/md0, missing codepage or helper program, or other error. Besides reporting the issue and gathering as much information as I can to help debug this, I'd also like to ask for some assistance trying to recover the data. I'm prepared to put in some effort. I'm on Gentoo and can apply git patches and rebuild the kernel or compile e2fsprogs. The system is a *32-bit* Gentoo installation built well over a decade ago, but is kept reasonably up to date. # uname -a Linux gw 5.16.9-gentoo #2 SMP Sun Feb 13 21:19:40 AEDT 2022 i686 Intel(R) Core(TM)2 Duo CPU E8400 @ 3.00GHz GenuineIntel GNU/Linux sys-fs/e2fsprogs Installed versions: 1.46.4(11:13:09 04/01/22)(nls split-usr threads -cron -fuse -lto -static-libs) sys-libs/e2fsprogs-libs Installed versions: 1.46.4-r1(17:26:02 02/01/22)(split-usr -static-libs ABI_MIPS="-n32 -n64 -o32" ABI_S390="-32 -64" ABI_X86="32 -64 -x32") Here is the sequence of steps that lead to data loss: Added one 8TB disk to a md raid5 array: # mdadm --add /dev/md0 /dev/sdi1 # mdadm --grow --raid-devices=4 --backup-file=/root/grow_md0_20220410.bak /dev/md0 [183222.697484] md: md0: reshape done. [183222.866677] md0: detected capacity change from 31255572480 to 46883358720 md0 : active raid5 sdi1[4] sda1[3] sdh1[1] sdg1[0] 23441679360 blocks super 1.2 level 5, 512k chunk, algorithm 2 [4/4] [UUUU] bitmap: 0/59 pages [0KB], 65536KB chunk # umount /mnt/RED8 # tune2fs -E stride=128,stripe_width=384 /dev/md0 # fsck.ext4 -f -v -C 0 -D /dev/md0 # mount -t ext4 /dev/md0 /mnt/RED8 At this stage I used the system for about a week without any issues. Then earlier today: # umount /mnt/RED8 # e2fsck -f /dev/md0 e2fsck 1.46.4 (18-Aug-2021) Pass 1: Checking inodes, blocks, and sizes Pass 2: Checking directory structure Pass 3: Checking directory connectivity Pass 4: Checking reference counts Pass 5: Checking group summary information RED8: 11665593/488370176 files (0.6% non-contiguous), 3434311347/3906946560 blocks # resize2fs /dev/md0 resize2fs 1.46.4 (18-Aug-2021) Resizing the filesystem on /dev/md0 to 5860419840 (4k) blocks. The filesystem on /dev/md0 is now 5860419840 (4k) blocks long. So far so good. Everything appears to be working just fine until now. # mount -t ext4 /dev/md0 /mnt/RED8 mount: /mnt/RED8: wrong fs type, bad option, bad superblock on /dev/md0, missing codepage or helper program, or other error. # dumpe2fs -h /dev/md0 dumpe2fs 1.46.4 (18-Aug-2021) dumpe2fs: Bad magic number in super-block while trying to open /dev/md0 Couldn't find valid filesystem superblock. # dumpe2fs -o superblock=32768 -h /dev/md0 dumpe2fs 1.46.4 (18-Aug-2021) Filesystem volume name: RED8 Last mounted on: /exported/Music Filesystem UUID: 1e999cb8-12b2-4ab7-b41b-c77fd267a102 Filesystem magic number: 0xEF53 Filesystem revision #: 1 (dynamic) Filesystem features: has_journal ext_attr resize_inode dir_index sparse_super2 filetype extent 64bit flex_bg large_dir inline_data sparse_super large_file huge_file dir_nlink extra_isize metadata_csum Filesystem flags: signed_directory_hash Default mount options: user_xattr acl Filesystem state: not clean Errors behavior: Continue Filesystem OS type: Linux Inode count: 732553216 Block count: 5860419840 Reserved block count: 0 Free blocks: 2410725583 Free inodes: 720887623 First block: 0 Block size: 4096 Fragment size: 4096 Group descriptor size: 64 Blocks per group: 32768 Fragments per group: 32768 Inodes per group: 4096 Inode blocks per group: 256 RAID stride: 128 RAID stripe width: 384 Flex block group size: 16 Filesystem created: Wed Jan 2 01:42:39 2019 Last mount time: Mon Apr 11 00:39:58 2022 Last write time: Fri Apr 15 17:53:06 2022 Mount count: 0 Maximum mount count: -1 Last checked: Fri Apr 15 17:04:07 2022 Check interval: 0 (<none>) Lifetime writes: 9 TB Reserved blocks uid: 0 (user root) Reserved blocks gid: 0 (group root) First inode: 11 Inode size: 256 Required extra isize: 32 Desired extra isize: 32 Journal inode: 8 Default directory hash: half_md4 Directory Hash Seed: 7f7889a7-4ff4-4bbb-a7d0-5e9821e7e70b Journal backup: inode blocks Backup block groups: 1 178845 Checksum type: crc32c Checksum: 0x47b714d9 Journal features: journal_incompat_revoke journal_64bit journal_checksum_v3 Total journal size: 1024M Total journal blocks: 262144 Max transaction length: 262144 Fast commit length: 0 Journal sequence: 0x00064746 Journal start: 0 Journal checksum type: crc32c Journal checksum: 0x262ca522 # e2fsck -f -C 0 -b 32768 -z /root/20220415_2015_e2fsck-b_32768.e2undo /dev/md0 e2fsck 1.46.4 (18-Aug-2021) Overwriting existing filesystem; this can be undone using the command: e2undo /root/20220415_2015_e2fsck-b_32768.e2undo /dev/md0 e2fsck: Undo file corrupt while trying to open /dev/md0 The superblock could not be read or does not describe a valid ext2/ext3/ext4 filesystem. If the device is valid and it really contains an ext2/ext3/ext4 filesystem (and not swap or ufs or something else), then the superblock is corrupt, and you might try running e2fsck with an alternate superblock: e2fsck -b 8193 <device> or e2fsck -b 32768 <device> In light of recent mailing list traffic, I suspect that the issue may be caused by sparse_super2 . Any suggestions as to what I could try to recover? Unfortunately, I do not have an undo file for the resize2fs run (which is a bit unusual for me, since I usually tend to take advantage of safety features). Thanks, Peter Urbanec
Powered by blists - more mailing lists