lists.openwall.net | lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC | |
Open Source and information security mailing list archives
| ||
|
Message-ID: <55F3FE07.9030807@harvyl.se> Date: Sat, 12 Sep 2015 12:27:19 +0200 From: Johan Harvyl <johan@...vyl.se> To: Theodore Ts'o <tytso@....edu>, linux-ext4@...r.kernel.org Subject: Re: resize2fs: Should never happen: resize inode corrupt! - lost key inodes Hi, I have now evacuated the data on the filesystem and I *did* manage to recreate the "Should never happen: resize inode corrupt!" using the versions of e2fsprogs I believe I was using at the time. The vast majority of the data that I was able to checksum was ok. For me I guess the way forward should be to recreate the fs with 1.42.13 and stick to online resize from now on, correct? Are there any feature flags that I should not use when expanding file systems or any that I must use? -johan Here is a step by step of what I did to reproduce I have built the following two versions of e2fsprogs (configure, make, make install, nothing else): 421d693 (HEAD) libext2fs: fix potential buffer overflow in closefs() 6a3741a (tag: v1.42.12) Update release notes, etc. for final 1.42.12 release 9779e29 (HEAD, tag: v1.42.10) Update release notes, etc. for final 1.42.10 release === First build the fs with 1.42.10 with the exact number of blocks I originally had. # MKE2FS_CONFIG=/root/e10/out/etc/mke2fs.conf /root/e10/out/sbin/mkfs.ext4 /dev/md0 -i 262144 -m 0 -O 64bit 15627548672k mke2fs 1.42.10 (18-May-2014) /dev/md0 contains a ext4 file system created on Sat Sep 12 11:23:02 2015 Proceed anyway? (y,n) y Creating filesystem with 3906887168 4k blocks and 61045248 inodes Filesystem UUID: d00e9e59-3756-4e59-9539-bc00fe2446b5 Superblock backups stored on blocks: 32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632, 2654208, 4096000, 7962624, 11239424, 20480000, 23887872, 71663616, 78675968, 102400000, 214990848, 512000000, 550731776, 644972544, 1934917632, 2560000000, 3855122432 Allocating group tables: done Writing inode tables: done Creating journal (32768 blocks): done Writing superblocks and filesystem accounting information: done From dumpe2fs I observe: 1) the fs features match what I had on my broken fs 2) the number of free blocks is 512088558484167 which is clearly wrong. # e2fsck -fnv /dev/md0 e2fsck 1.42.13 (17-May-2015) Pass 1: Checking inodes, blocks, and sizes Pass 2: Checking directory structure Pass 3: Checking directory connectivity Pass 4: Checking reference counts Pass 5: Checking group summary information Free blocks count wrong (512088558484167, counted=3902749383). Fix? no So the initial fs created by 1.42.10 appear to be bad. Filesystem volume name: <none> Last mounted on: <not available> Filesystem UUID: d00e9e59-3756-4e59-9539-bc00fe2446b5 Filesystem magic number: 0xEF53 Filesystem revision #: 1 (dynamic) Filesystem features: has_journal ext_attr resize_inode dir_index filetype extent 64bit flex_bg sparse_super large_file huge_file uninit_bg dir_nlink extra_isize Filesystem flags: signed_directory_hash Default mount options: user_xattr acl Filesystem state: clean Errors behavior: Continue Filesystem OS type: Linux Inode count: 61045248 Block count: 3906887168 Reserved block count: 0 Free blocks: 512088558484167 Free inodes: 61045237 First block: 0 Block size: 4096 Fragment size: 4096 Group descriptor size: 64 Reserved GDT blocks: 185 Blocks per group: 32768 Fragments per group: 32768 Inodes per group: 512 Inode blocks per group: 32 Flex block group size: 16 Filesystem created: Sat Sep 12 11:27:55 2015 Last mount time: n/a Last write time: Sat Sep 12 11:27:55 2015 Mount count: 0 Maximum mount count: -1 Last checked: Sat Sep 12 11:27:55 2015 Check interval: 0 (<none>) Lifetime writes: 158 MB Reserved blocks uid: 0 (user root) Reserved blocks gid: 0 (group root) First inode: 11 Inode size: 256 Required extra isize: 28 Desired extra isize: 28 Journal inode: 8 Default directory hash: half_md4 Directory Hash Seed: f252a723-7016-43d1-97f8-579062a215e1 Journal backup: inode blocks Journal features: (none) Journal size: 128M Journal length: 32768 Journal sequence: 0x00000001 Journal start: 0 The next step is resizing + 4 TB with 1.42.12. # MKE2FS_CONFIG=/root/e12/out/etc/mke2fs.conf /root/e12/out/sbin/resize2fs -p /dev/md0 19534435840k resize2fs 1.42.12 (29-Aug-2014) <and nothing more> It did *not* print the "Resizing the filesystem on /dev/md0 to 4883608960 (4k) blocks." that it should have. I let it run for 90+ minutes sampling CPU and IO usage with iotop from time to time. It was using more or less 100% CPU and no visible io. So, I let e2fsck fix the free block count and re-did the resize: # e2fsck -f /dev/md0 e2fsck 1.42.13 (17-May-2015) Pass 1: Checking inodes, blocks, and sizes Pass 2: Checking directory structure Pass 3: Checking directory connectivity Pass 4: Checking reference counts Pass 5: Checking group summary information Free blocks count wrong (512088558484167, counted=3902749383). Fix<y>? yes /dev/md0: ***** FILE SYSTEM WAS MODIFIED ***** /dev/md0: 11/61045248 files (0.0% non-contiguous), 4137785/3906887168 blocks # MKE2FS_CONFIG=/root/e12/out/etc/mke2fs.conf /root/e12/out/sbin/resize2fs -p /dev/md0 19534435840k resize2fs 1.42.12 (29-Aug-2014) Resizing the filesystem on /dev/md0 to 4883608960 (4k) blocks. Begin pass 2 (max = 6) Relocating blocks XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX Begin pass 3 (max = 119229) Scanning inode table XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX Begin pass 5 (max = 8) Moving inode table XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX The filesystem on /dev/md0 is now 4883608960 (4k) blocks long. dumpe2fs 1.42.13 (17-May-2015) Filesystem volume name: <none> Last mounted on: <not available> Filesystem UUID: 159d3929-1842-4f8d-907f-7509c16f06df Filesystem magic number: 0xEF53 Filesystem revision #: 1 (dynamic) Filesystem features: has_journal ext_attr resize_inode dir_index filetype extent 64bit flex_bg sparse_super large_file huge_file uninit_bg dir_nlink extra_isize Filesystem flags: signed_directory_hash Default mount options: user_xattr acl Filesystem state: clean Errors behavior: Continue Filesystem OS type: Linux Inode count: 76306432 Block count: 4883608960 Reserved block count: 0 Free blocks: 4878450712 Free inodes: 76306421 First block: 0 Block size: 4096 Fragment size: 4096 Group descriptor size: 64 Blocks per group: 32768 Fragments per group: 32768 Inodes per group: 512 Inode blocks per group: 32 RAID stride: 32752 Flex block group size: 16 Filesystem created: Sat Sep 12 11:41:10 2015 Last mount time: n/a Last write time: Sat Sep 12 11:56:20 2015 Mount count: 0 Maximum mount count: -1 Last checked: Sat Sep 12 11:49:28 2015 Check interval: 0 (<none>) Lifetime writes: 279 MB Reserved blocks uid: 0 (user root) Reserved blocks gid: 0 (group root) First inode: 11 Inode size: 256 Required extra isize: 28 Desired extra isize: 28 Journal inode: 8 Default directory hash: half_md4 Directory Hash Seed: feeea566-bb38-44c6-a4d5-f97aa78001d4 Journal backup: inode blocks Journal features: (none) Journal size: 128M Journal length: 32768 Journal sequence: 0x00000001 Journal start: 0 Looking good so far, and now for the final resize to 24 TB using 1.42.13: # resize2fs -p /dev/md0 resize2fs 1.42.13 (17-May-2015) Resizing the filesystem on /dev/md0 to 5860330752 (4k) blocks. Begin pass 2 (max = 6) Relocating blocks XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX Begin pass 3 (max = 149036) Scanning inode table XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX Begin pass 5 (max = 14) Moving inode table XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX Should never happen: resize inode corrupt! # dumpe2fs -h /dev/md0 dumpe2fs 1.42.13 (17-May-2015) Filesystem volume name: <none> Last mounted on: <not available> Filesystem UUID: 159d3929-1842-4f8d-907f-7509c16f06df Filesystem magic number: 0xEF53 Filesystem revision #: 1 (dynamic) Filesystem features: has_journal ext_attr resize_inode dir_index filetype extent 64bit flex_bg sparse_super large_file huge_file uninit_bg dir_nlink extra_isize Filesystem flags: signed_directory_hash Default mount options: user_xattr acl Filesystem state: clean with errors Errors behavior: Continue Filesystem OS type: Linux Inode count: 91568128 Block count: 5860330752 Reserved block count: 0 Free blocks: 5853069550 Free inodes: 91568117 First block: 0 Block size: 4096 Fragment size: 4096 Group descriptor size: 64 Blocks per group: 32768 Fragments per group: 32768 Inodes per group: 512 Inode blocks per group: 32 RAID stride: 32752 Flex block group size: 16 Filesystem created: Sat Sep 12 11:41:10 2015 Last mount time: n/a Last write time: Sat Sep 12 12:03:55 2015 Mount count: 0 Maximum mount count: -1 Last checked: Sat Sep 12 11:49:28 2015 Check interval: 0 (<none>) Lifetime writes: 279 MB Reserved blocks uid: 0 (user root) Reserved blocks gid: 0 (group root) First inode: 11 Inode size: 256 Required extra isize: 28 Desired extra isize: 28 Journal inode: 8 Default directory hash: half_md4 Directory Hash Seed: feeea566-bb38-44c6-a4d5-f97aa78001d4 Journal backup: inode blocks Journal superblock magic number invalid! On 2015-09-04 00:16, Johan Harvyl wrote: > Hello again, > > I finally got around to dig some more into this and made what I > consider some good progress as I am now able to mount the filesystem > read-only so I thought I would update this thread a bit. > > Short one sentence recap since it's been a while since the original > post: I am trying to recover a filesystem that was quite badly damaged > by an offline resize2fs of a fairly modern ext4fs from 20 TB to 24 TB. > > I spent a lot of time trying to get something meaningful out of > e2fsck/debugfs and learned quite a bit in the process and I would like > to briefly share some observations. > > 1) The first hurdle running e2fsck -fnv is that the "Superblock has an > invalid journal (inode 8)" is considered fatal and cannot be fixed, at > least not in r/o mode so e2fsck just stops, this check needed to go away. > > 2) e2fsck gets utterly confused by the "bad block inode" that > incorrectly gets identified as having something worth looking at and > spends days iterating through blocks (before I cancelled it). Removing > handling if ino == EXT2_BAD_INO in pass1 and pass1b made things a bit > better. > > 3) e2fsck using a backup superblock > ext2fs_check_desc: Corrupt group descriptor: bad block for inode table > e2fsck: Group descriptors look bad... trying backup blocks... > This is bad, as it means using a superblock that has not been updated > with the +4TB. Consequently it gets the location of the first block > group wrong, or at the very least the first inode table that houses > the root inode. > Forcing it to use the master superblock again makes things a bit better. > > I have some logs from various e2fsck runs with various amounts of > hacks applied if they are of any interest to developers? I will also > likely have the filesystem in this state for a week or two more if any > other information I can extract is of interest to figure out what made > resize2fs screw things up. > > > > In the end, the only actual change I have made to the filesystem to > make it mountable is that I borrowed a root inode from a different > filesystem and updated the i_block pointer to point to the extent tree > corresponding to the root inode of my broken filesystem which was > quite easy to find by just looking for the string "lost+found". > > # mount -o ro,noload /dev/md0 /mnt/loop > [2815465.034803] EXT4-fs (md0): mounted filesystem without journal. > Opts: noload > > # df -h /dev/md0 > Filesystem Size Used Avail Use% Mounted on > /dev/md0 22T -382T 404T - /mnt/loop > > Uh oh, does not look to good.. But hey, doing some checks on the data > contents and so far results are very promising. An "ls /" looks good > and so does a lot of the data that I can verify checksums on, checks > are still running... > > I really do not know how to move on with trying to repair the > filesystem with e2fsck. I do not feel brave enough to let it run r/w > on the given how many hacks that I consider very dirty were required > to even get it this far. At this point letting it make changes to the > filesystem may actually make it worse so I see no other way forward > than extracting all the contents and recreating the filesystem from > scratch. > > Question is though, what is the recommended way to create the > filesystem? 64bit is clearly necessary, but what about the other > feature flags like flex_bg/meta_bg/resize_inode...? I do not care much > about slight gains in performance, robustness is more important, and > that it can be resized in the future. > > Only online resize from now on, never offlline, I learned that lesson... > > Will it be possible to expand from 24 TB to 28 TB online? > > thanks, > -johan > > > On 2015-08-13 20:12, Johan Harvyl wrote: >> On 2015-08-13 15:27, Theodore Ts'o wrote: >>> On Thu, Aug 13, 2015 at 12:00:50AM +0200, Johan Harvyl wrote: >>> >>>>> I'm not aware of any offline resize with 1.42.13, but it sounds like >>>>> you were originally using mke2fs and resize2fs 1.42.10, which did >>>>> have >>>>> some bugs, and so the question is what sort of might it might have >>>>> left things. >>>> What kind of bugs are we talking about, mke2fs? resize2fs? e2fsck? Any >>>> specific commits of interest? >>> I suspect it was caused by a bug in resize2fs 1.42.10. The problem is >>> that off-line resize2fs is much more powerful; it can handle moving >>> file system metadata blocks around, so it can grow file systems in >>> cases which aren't supported by online resize --- and it can shrink >>> file systems when online resize doesn't support any kind of file >>> system shrink. As such, the code is a lot more complicated, whereas >>> the online resize code is much simpler, and ultimately, much more >>> robust. >> Understood, so would it have been possible to move from my 20 TB -> >> 24 TB fs with >> online resize? I am confused by the threads I see on the net with >> regards to this. >>>> Can you think of why it would zero out the first thousands of >>>> inodes, like the root inode, lost+found and so on? I am thinking >>>> that would help me assess the potential damage to the files. Could I >>>> perhaps expect the same kind of zeroed out blocks at regular >>>> intervals all over the device? >>> I didn't realize that the first thousands of inodes had been zeroed; >>> either you didn't mention this earier or I had missed that from your >>> e-mail. I suspect the resize inode before the resize was pretty >>> terribly corrupted, but in a way that e2fsck didn't complain. >> >> Hi, >> >> I may not have been clear on that it was not just the first handful >> of inodes. >> >> When I manually sampled some inodes with debugfs and a disk editor, >> the first group >> I found valid inodes in was: >> Group 48: block bitmap at 1572864, inode bitmap at 1572880, inode >> table at 1572896 >> >> With 512 inodes per group that would mean at least some 24k inodes >> are blanked out, >> but I did not check them all, I just sampled groups manually so there >> could be some >> valid in some of the groups below group 48 or a lot more invalid >> afterwards. >> >>> I'll have to try to reproduce the problem based how you originally >>> created and grew the file system and see if I can somehow reproduce >>> the problem. Obviously e2fsck and resize2fs should be changed to make >>> this operation much more robust. If you can tell me the exact >>> original size (just under 16TB is probably good enough, but if you >>> know the exact starting size, that might be helpful), and then steps >>> by which the file system was grown, and which version of e2fsprogs was >>> installed at the time, that would be quite helpful. >>> >>> Thanks, >>> >>> - Ted >> >> Cool, I will try to go through its history in some detail below. >> >> If you have ideas on what I could look for, like ideas on if there is >> a particular periodicity >> to the corruption I can write some python to explore such theories. >> >> >> The filesystem was originally created with e2fsprogs 1.42.10-1 and >> most likely linux-image >> 3.14 from Debian. >> >> # mkfs.ext4 /dev/md0 -i 262144 -m 0 -O 64bit >> mke2fs 1.42.10 (18-May-2014) >> Creating filesystem with 3906887168 4k blocks and 61045248 inodes >> Filesystem UUID: 13c2eb37-e951-4ad1-b194-21f0880556db >> Superblock backups stored on blocks: >> 32768, 98304, 163840, 229376, 294912, 819200, 884736, >> 1605632, 2654208, >> 4096000, 7962624, 11239424, 20480000, 23887872, 71663616, >> 78675968, >> 102400000, 214990848, 512000000, 550731776, 644972544, >> 1934917632, >> 2560000000, 3855122432 >> >> Allocating group tables: done >> Writing inode tables: done >> Creating journal (32768 blocks): done >> Writing superblocks and filesystem accounting information: done >> # >> >> It was expanded with 4 TB (another 976721792 4k blocks). Best I can >> tell from my logs this >> was done with either e2fsprogs:amd64 1.42.12-1 or 1.42.12-1.1 (debian >> packages) and >> Linux 3.16. Everything was running fine after this. >> NOTE #1: It does *not* look like this filesystem was ever touched by >> resize2fs 1.42.10. >> NOTE #2: The diff between debian packages 1.42.12-1 and 1.42.12-1.1 >> appear to be this: >> 49d0fe2 libext2fs: fix potential buffer overflow in closefs() >> >> Then for the final 4 TB for a total of 5860330752 4k blocks which was >> done with >> e2fsprogs:amd64 1.42.13-1 and Linux 4.0. This is where the: >> "Should never happen: resize inode corrupt" >> was seen. >> >> In both cases the same offline resize was done, with no exotic options: >> # umount /dev/md0 >> # fsck.ext4 -f /dev/md0 >> # resize2fs /dev/md0 >> >> thanks, >> -johan > -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@...r.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists