lists.openwall.net | lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC | |
Open Source and information security mailing list archives
| ||
|
Message-Id: <B9C846A3-35D0-480C-9888-3F46E8A9C6A5@dilger.ca> Date: Wed, 16 Sep 2015 19:21:59 -0600 From: Andreas Dilger <adilger@...ger.ca> To: Johan Harvyl <johan@...vyl.se> Cc: Theodore Ts'o <tytso@....edu>, "linux-ext4@...r.kernel.org" <linux-ext4@...r.kernel.org> Subject: Re: resize2fs: Should never happen: resize inode corrupt! - lost key inodes If you add "-b 1024" to the mke2fs command line to use 1KB instead of 4KB blocks, and reduce the sizes by a factor of 4 does the problem still happen? That would make it easier for someone else to test, since it would only need a 4-5TB disk instead of a 19Tb array. Cheers, Andreas > On Sep 15, 2015, at 11:55, Johan Harvyl <johan@...vyl.se> wrote: > > I have now been able to reproduce the issue that resize2fs corrupts at least the root, resize and journal > inodes with versions 1.42.13 and the more recent commit 956b0f1 of e2fsprogs. > > Note that older versions of e2fsprogs need *not* be involved, 1.42.13 and newer also have issues. > > Please advice on things I can try to narrow down the root cause of what has to be an e2fsprogs bug. In > particular it would be very useful to reproduce it faster, running through the mkfs and two resize steps > takes around ten minutes so iterative testing is a slow and I do not really have much of clue what steps > would be more likely to overwrite the inodes. > > At some point I would like to return this array to service but I am not really comfortable creating a > new ext4 filesystem on it without first understanding how it can become corrupted without even > mounting the file system. > > For 1.42.13: > # mkfs.ext4 /dev/md0 -i 262144 -m 0 -O 64bit 15627548672k > # resize2fs -p /dev/md0 19534435840k > # resize2fs -p /dev/md0 > # e2fsck -fn /dev/md0 > e2fsck 1.42.13 (17-May-2015) > ext2fs_check_desc: Corrupt group descriptor: bad block for inode table > e2fsck: Group descriptors look bad... trying backup blocks... > Superblock has an invalid journal (inode 8). > Clear? no > > e2fsck: Illegal inode number while checking ext3 journal for /dev/md0 > > /dev/md0: ********** WARNING: Filesystem still has errors ********** > > > or for 956b0f1: > # MKE2FS_CONFIG=/root/elatest/out/etc/mke2fs.conf /root/elatest/out/sbin/mkfs.ext4 /dev/md0 -i 262144 -m 0 -O 64bit 15627548672k > # MKE2FS_CONFIG=/root/elatest/out/etc/mke2fs.conf /root/elatest/out/sbin/resize2fs -p /dev/md0 19534435840k > # MKE2FS_CONFIG=/root/elatest/out/etc/mke2fs.conf /root/elatest/out/sbin/resize2fs -p /dev/md0 > # MKE2FS_CONFIG=/root/elatest/out/etc/mke2fs.conf /root/elatest/out/sbin/e2fsck -fn /dev/md0 > e2fsck 1.43-WIP (18-May-2015) > ext2fs_open2: Superblock checksum does not match superblock > /root/elatest/out/sbin/e2fsck: Superblock invalid, trying backup blocks... > Superblock has an invalid journal (inode 8). > Clear? no > > /root/elatest/out/sbin/e2fsck: Illegal inode number while checking ext3 journal for /dev/md0 > > /dev/md0: ********** WARNING: Filesystem still has errors ********** > > # MKE2FS_CONFIG=/root/elatest/out/etc/mke2fs.conf /root/elatest/out/sbin/debugfs -c /dev/md0 > debugfs 1.43-WIP (18-May-2015) > /dev/md0: Superblock checksum does not match superblock while opening filesystem > debugfs: stat <2> > stat: Filesystem not open > > # debugfs -c /dev/md0 > debugfs 1.42.13 (17-May-2015) > /dev/md0: catastrophic mode - not reading inode or group bitmaps > debugfs: stat <2> > Inode: 2 Type: bad type Mode: 0004 Flags: 0x1 > Generation: 1 Version: 0x00000001 > User: 9440 Group: 0 Size: 618659860 > File ACL: 1 Directory ACL: 0 > Links: 0 Blockcount: 724107776 > Fragment: Address: 0 Number: 0 Size: 0 > ctime: 0x02008000 -- Sun Jan 24 18:46:40 1971 > atime: 0x24e000a0 -- Wed Aug 9 12:00:00 1989 > mtime: 0x00030000 -- Sat Jan 3 07:36:48 1970 > Size of extra inode fields: 6 > BLOCKS: > (0):1, (6):618659845 .... and it goes on... > >> On 2015-09-14 23:35, Johan Harvyl wrote: >> In an attempt to further isolate what versions of e2fsprogs, at a commit level, that are >> needed to reproduce the bad behavior I tried my own step-by-step, initially with a much >> higher -i 16777216 to mkfs.ext4 in the hope that fewer inodes would make all the >> operations run faster. >> >> When I was unable to reproduce with -i 16777216 instead, I switched back to exactly >> what I reproduced with the first time, and I *still* did not get the "Should never happen: >> resize inode corrupt!". >> >> The only reasonable explanation I can come up with to this is that something is not being >> initialized properly that resize2fs expects to be initialized. I have no indications of any >> issues with any hardware or the underlying md block. >> >> What I did however notice is that I can have the same kind of filesystem corruption >> *without* seeing the "Should never happen: resize inode corrupt!" message using the >> following sequence, and this *is* reproducible one time after another: >> >> # MKE2FS_CONFIG=/root/e10/out/etc/mke2fs.conf /root/e10/out/sbin/mkfs.ext4 /dev/md0 -i 262144 -m 0 -O 64bit 15627548672k >> # e2fsck -fy /dev/md0 (using 1.42.13) >> # resize2fs -p /dev/md0 19534435840k (using 1.42.13) >> # resize2fs -p /dev/md0 (using 1.42.13) >> # e2fsck -fn /dev/md0 >> e2fsck 1.42.13 (17-May-2015) >> ext2fs_check_desc: Corrupt group descriptor: bad block for inode table >> e2fsck: Group descriptors look bad... trying backup blocks... >> Superblock has an invalid journal (inode 8). >> Clear? no >> >> e2fsck: Illegal inode number while checking ext3 journal for /dev/md0 >> >> At this point the root inode is also bad and this fails: >> # mount /dev/md0 /mnt/loop -o ro,noload >> mount: mount /dev/md0 on /mnt/loop failed: Stale file handle >> [3766493.732188] EXT4-fs (md0): get root inode failed >> [3766493.732190] EXT4-fs (md0): mount failed >> >> Note that only versions 1.42.10 and 1.42.13 are involved now, 1.42.12 is not needed. >> >> Kernel is the debian: >> ii linux-image-4.0.0-2-amd64 4.0.8-2 amd64 Linux 4.0 for 64-bit PCs >> >> For the record I also tried a more recent e2fsprogs for the resize (instead of 1.42.13), >> locally built from: >> 956b0f1 Merge branch 'maint' into next >> and I could still reproduce it on the first attempt. >> >> More verbose logs follows. >> >> Does anyone else have some kind of testbed to test the same sequence of commands? >> >> === >> >> # MKE2FS_CONFIG=/root/e10/out/etc/mke2fs.conf /root/e10/out/sbin/mkfs.ext4 /dev/md0 -i 262144 -m 0 -O 64bit 15627548672k >> mke2fs 1.42.10 (18-May-2014) >> /dev/md0 contains a ext4 file system >> last mounted on Sun Sep 13 22:19:28 2015 >> Proceed anyway? (y,n) y >> Creating filesystem with 3906887168 4k blocks and 61045248 inodes >> Filesystem UUID: e263356e-4fe4-4e9b-bd0c-8edc2c411735 >> Superblock backups stored on blocks: >> 32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632, 2654208, >> 4096000, 7962624, 11239424, 20480000, 23887872, 71663616, 78675968, >> 102400000, 214990848, 512000000, 550731776, 644972544, 1934917632, >> 2560000000, 3855122432 >> >> Allocating group tables: done >> Writing inode tables: done >> Creating journal (32768 blocks): done >> Writing superblocks and filesystem accounting information: done >> >> # e2fsck -fy /dev/md0 >> e2fsck 1.42.13 (17-May-2015) >> Pass 1: Checking inodes, blocks, and sizes >> Pass 2: Checking directory structure >> Pass 3: Checking directory connectivity >> Pass 4: Checking reference counts >> Pass 5: Checking group summary information >> Free blocks count wrong (512088558484167, counted=3902749383). >> Fix? yes >> >> >> /dev/md0: ***** FILE SYSTEM WAS MODIFIED ***** >> /dev/md0: 11/61045248 files (0.0% non-contiguous), 4137785/3906887168 blocks >> >> # resize2fs -p /dev/md0 19534435840k >> resize2fs 1.42.13 (17-May-2015) >> Resizing the filesystem on /dev/md0 to 4883608960 (4k) blocks. >> Begin pass 2 (max = 6) >> Relocating blocks XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX >> Begin pass 3 (max = 119229) >> Scanning inode table XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX >> Begin pass 5 (max = 8) >> Moving inode table XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX >> The filesystem on /dev/md0 is now 4883608960 (4k) blocks long. >> >> # resize2fs -p /dev/md0 >> resize2fs 1.42.13 (17-May-2015) >> Resizing the filesystem on /dev/md0 to 5860330752 (4k) blocks. >> Begin pass 2 (max = 6) >> Relocating blocks XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX >> Begin pass 3 (max = 149036) >> Scanning inode table XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX >> Begin pass 5 (max = 14) >> Moving inode table XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX >> The filesystem on /dev/md0 is now 5860330752 (4k) blocks long. >> >> # e2fsck -fn /dev/md0 >> e2fsck 1.42.13 (17-May-2015) >> ext2fs_check_desc: Corrupt group descriptor: bad block for inode table >> e2fsck: Group descriptors look bad... trying backup blocks... >> Superblock has an invalid journal (inode 8). >> Clear? no >> >> e2fsck: Illegal inode number while checking ext3 journal for /dev/md0 >> >>> On 2015-09-12 12:27, Johan Harvyl wrote: >>> Hi, >>> >>> I have now evacuated the data on the filesystem and I *did* manage to recreate the >>> "Should never happen: resize inode corrupt!" using the versions of e2fsprogs I believe I was using at the time. >>> >>> The vast majority of the data that I was able to checksum was ok. >>> >>> For me I guess the way forward should be to recreate the fs with 1.42.13 and stick to online resize >>> from now on, correct? >>> >>> Are there any feature flags that I should not use when expanding file systems or any that I must use? >>> >>> -johan >>> >>> >>> Here is a step by step of what I did to reproduce >>> >>> I have built the following two versions of e2fsprogs (configure, make, make install, nothing else): >>> 421d693 (HEAD) libext2fs: fix potential buffer overflow in closefs() >>> 6a3741a (tag: v1.42.12) Update release notes, etc. for final 1.42.12 release >>> >>> 9779e29 (HEAD, tag: v1.42.10) Update release notes, etc. for final 1.42.10 release >>> >>> === >>> >>> First build the fs with 1.42.10 with the exact number of blocks I originally had. >>> >>> # MKE2FS_CONFIG=/root/e10/out/etc/mke2fs.conf /root/e10/out/sbin/mkfs.ext4 /dev/md0 -i 262144 -m 0 -O 64bit 15627548672k >>> mke2fs 1.42.10 (18-May-2014) >>> /dev/md0 contains a ext4 file system >>> created on Sat Sep 12 11:23:02 2015 >>> Proceed anyway? (y,n) y >>> Creating filesystem with 3906887168 4k blocks and 61045248 inodes >>> Filesystem UUID: d00e9e59-3756-4e59-9539-bc00fe2446b5 >>> Superblock backups stored on blocks: >>> 32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632, 2654208, >>> 4096000, 7962624, 11239424, 20480000, 23887872, 71663616, 78675968, >>> 102400000, 214990848, 512000000, 550731776, 644972544, 1934917632, >>> 2560000000, 3855122432 >>> >>> Allocating group tables: done >>> Writing inode tables: done >>> Creating journal (32768 blocks): done >>> Writing superblocks and filesystem accounting information: done >>> >>> From dumpe2fs I observe: >>> 1) the fs features match what I had on my broken fs >>> 2) the number of free blocks is 512088558484167 which is clearly wrong. >>> >>> # e2fsck -fnv /dev/md0 >>> e2fsck 1.42.13 (17-May-2015) >>> Pass 1: Checking inodes, blocks, and sizes >>> Pass 2: Checking directory structure >>> Pass 3: Checking directory connectivity >>> Pass 4: Checking reference counts >>> Pass 5: Checking group summary information >>> Free blocks count wrong (512088558484167, counted=3902749383). >>> Fix? no >>> >>> So the initial fs created by 1.42.10 appear to be bad. >>> >>> Filesystem volume name: <none> >>> Last mounted on: <not available> >>> Filesystem UUID: d00e9e59-3756-4e59-9539-bc00fe2446b5 >>> Filesystem magic number: 0xEF53 >>> Filesystem revision #: 1 (dynamic) >>> Filesystem features: has_journal ext_attr resize_inode dir_index filetype extent 64bit flex_bg sparse_super large_file huge_file uninit_bg dir_nlink extra_isize >>> Filesystem flags: signed_directory_hash >>> Default mount options: user_xattr acl >>> Filesystem state: clean >>> Errors behavior: Continue >>> Filesystem OS type: Linux >>> Inode count: 61045248 >>> Block count: 3906887168 >>> Reserved block count: 0 >>> Free blocks: 512088558484167 >>> Free inodes: 61045237 >>> First block: 0 >>> Block size: 4096 >>> Fragment size: 4096 >>> Group descriptor size: 64 >>> Reserved GDT blocks: 185 >>> Blocks per group: 32768 >>> Fragments per group: 32768 >>> Inodes per group: 512 >>> Inode blocks per group: 32 >>> Flex block group size: 16 >>> Filesystem created: Sat Sep 12 11:27:55 2015 >>> Last mount time: n/a >>> Last write time: Sat Sep 12 11:27:55 2015 >>> Mount count: 0 >>> Maximum mount count: -1 >>> Last checked: Sat Sep 12 11:27:55 2015 >>> Check interval: 0 (<none>) >>> Lifetime writes: 158 MB >>> Reserved blocks uid: 0 (user root) >>> Reserved blocks gid: 0 (group root) >>> First inode: 11 >>> Inode size: 256 >>> Required extra isize: 28 >>> Desired extra isize: 28 >>> Journal inode: 8 >>> Default directory hash: half_md4 >>> Directory Hash Seed: f252a723-7016-43d1-97f8-579062a215e1 >>> Journal backup: inode blocks >>> Journal features: (none) >>> Journal size: 128M >>> Journal length: 32768 >>> Journal sequence: 0x00000001 >>> Journal start: 0 >>> >>> >>> >>> The next step is resizing + 4 TB with 1.42.12. >>> # MKE2FS_CONFIG=/root/e12/out/etc/mke2fs.conf /root/e12/out/sbin/resize2fs -p /dev/md0 19534435840k >>> resize2fs 1.42.12 (29-Aug-2014) >>> <and nothing more> >>> It did *not* print the "Resizing the filesystem on /dev/md0 to 4883608960 (4k) blocks." that it should have. >>> >>> I let it run for 90+ minutes sampling CPU and IO usage with iotop from time to time. It was using more or less 100% CPU and no visible io. >>> >>> So, I let e2fsck fix the free block count and re-did the resize: >>> # e2fsck -f /dev/md0 >>> e2fsck 1.42.13 (17-May-2015) >>> Pass 1: Checking inodes, blocks, and sizes >>> Pass 2: Checking directory structure >>> Pass 3: Checking directory connectivity >>> Pass 4: Checking reference counts >>> Pass 5: Checking group summary information >>> Free blocks count wrong (512088558484167, counted=3902749383). >>> Fix<y>? yes >>> >>> /dev/md0: ***** FILE SYSTEM WAS MODIFIED ***** >>> /dev/md0: 11/61045248 files (0.0% non-contiguous), 4137785/3906887168 blocks >>> >>> # MKE2FS_CONFIG=/root/e12/out/etc/mke2fs.conf /root/e12/out/sbin/resize2fs -p /dev/md0 19534435840k >>> resize2fs 1.42.12 (29-Aug-2014) >>> Resizing the filesystem on /dev/md0 to 4883608960 (4k) blocks. >>> Begin pass 2 (max = 6) >>> Relocating blocks XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX >>> Begin pass 3 (max = 119229) >>> Scanning inode table XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX >>> Begin pass 5 (max = 8) >>> Moving inode table XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX >>> The filesystem on /dev/md0 is now 4883608960 (4k) blocks long. >>> >>> dumpe2fs 1.42.13 (17-May-2015) >>> Filesystem volume name: <none> >>> Last mounted on: <not available> >>> Filesystem UUID: 159d3929-1842-4f8d-907f-7509c16f06df >>> Filesystem magic number: 0xEF53 >>> Filesystem revision #: 1 (dynamic) >>> Filesystem features: has_journal ext_attr resize_inode dir_index filetype extent 64bit flex_bg sparse_super large_file huge_file uninit_bg dir_nlink extra_isize >>> Filesystem flags: signed_directory_hash >>> Default mount options: user_xattr acl >>> Filesystem state: clean >>> Errors behavior: Continue >>> Filesystem OS type: Linux >>> Inode count: 76306432 >>> Block count: 4883608960 >>> Reserved block count: 0 >>> Free blocks: 4878450712 >>> Free inodes: 76306421 >>> First block: 0 >>> Block size: 4096 >>> Fragment size: 4096 >>> Group descriptor size: 64 >>> Blocks per group: 32768 >>> Fragments per group: 32768 >>> Inodes per group: 512 >>> Inode blocks per group: 32 >>> RAID stride: 32752 >>> Flex block group size: 16 >>> Filesystem created: Sat Sep 12 11:41:10 2015 >>> Last mount time: n/a >>> Last write time: Sat Sep 12 11:56:20 2015 >>> Mount count: 0 >>> Maximum mount count: -1 >>> Last checked: Sat Sep 12 11:49:28 2015 >>> Check interval: 0 (<none>) >>> Lifetime writes: 279 MB >>> Reserved blocks uid: 0 (user root) >>> Reserved blocks gid: 0 (group root) >>> First inode: 11 >>> Inode size: 256 >>> Required extra isize: 28 >>> Desired extra isize: 28 >>> Journal inode: 8 >>> Default directory hash: half_md4 >>> Directory Hash Seed: feeea566-bb38-44c6-a4d5-f97aa78001d4 >>> Journal backup: inode blocks >>> Journal features: (none) >>> Journal size: 128M >>> Journal length: 32768 >>> Journal sequence: 0x00000001 >>> Journal start: 0 >>> >>> Looking good so far, and now for the final resize to 24 TB using 1.42.13: >>> # resize2fs -p /dev/md0 >>> resize2fs 1.42.13 (17-May-2015) >>> Resizing the filesystem on /dev/md0 to 5860330752 (4k) blocks. >>> Begin pass 2 (max = 6) >>> Relocating blocks XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX >>> Begin pass 3 (max = 149036) >>> Scanning inode table XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX >>> Begin pass 5 (max = 14) >>> Moving inode table XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX >>> Should never happen: resize inode corrupt! >>> >>> # dumpe2fs -h /dev/md0 >>> dumpe2fs 1.42.13 (17-May-2015) >>> Filesystem volume name: <none> >>> Last mounted on: <not available> >>> Filesystem UUID: 159d3929-1842-4f8d-907f-7509c16f06df >>> Filesystem magic number: 0xEF53 >>> Filesystem revision #: 1 (dynamic) >>> Filesystem features: has_journal ext_attr resize_inode dir_index filetype extent 64bit flex_bg sparse_super large_file huge_file uninit_bg dir_nlink extra_isize >>> Filesystem flags: signed_directory_hash >>> Default mount options: user_xattr acl >>> Filesystem state: clean with errors >>> Errors behavior: Continue >>> Filesystem OS type: Linux >>> Inode count: 91568128 >>> Block count: 5860330752 >>> Reserved block count: 0 >>> Free blocks: 5853069550 >>> Free inodes: 91568117 >>> First block: 0 >>> Block size: 4096 >>> Fragment size: 4096 >>> Group descriptor size: 64 >>> Blocks per group: 32768 >>> Fragments per group: 32768 >>> Inodes per group: 512 >>> Inode blocks per group: 32 >>> RAID stride: 32752 >>> Flex block group size: 16 >>> Filesystem created: Sat Sep 12 11:41:10 2015 >>> Last mount time: n/a >>> Last write time: Sat Sep 12 12:03:55 2015 >>> Mount count: 0 >>> Maximum mount count: -1 >>> Last checked: Sat Sep 12 11:49:28 2015 >>> Check interval: 0 (<none>) >>> Lifetime writes: 279 MB >>> Reserved blocks uid: 0 (user root) >>> Reserved blocks gid: 0 (group root) >>> First inode: 11 >>> Inode size: 256 >>> Required extra isize: 28 >>> Desired extra isize: 28 >>> Journal inode: 8 >>> Default directory hash: half_md4 >>> Directory Hash Seed: feeea566-bb38-44c6-a4d5-f97aa78001d4 >>> Journal backup: inode blocks >>> Journal superblock magic number invalid! >>> >>> >>>> On 2015-09-04 00:16, Johan Harvyl wrote: >>>> Hello again, >>>> >>>> I finally got around to dig some more into this and made what I consider some good progress as I am now able to mount the filesystem read-only so I thought I would update this thread a bit. >>>> >>>> Short one sentence recap since it's been a while since the original post: I am trying to recover a filesystem that was quite badly damaged by an offline resize2fs of a fairly modern ext4fs from 20 TB to 24 TB. >>>> >>>> I spent a lot of time trying to get something meaningful out of e2fsck/debugfs and learned quite a bit in the process and I would like to briefly share some observations. >>>> >>>> 1) The first hurdle running e2fsck -fnv is that the "Superblock has an invalid journal (inode 8)" is considered fatal and cannot be fixed, at least not in r/o mode so e2fsck just stops, this check needed to go away. >>>> >>>> 2) e2fsck gets utterly confused by the "bad block inode" that incorrectly gets identified as having something worth looking at and spends days iterating through blocks (before I cancelled it). Removing handling if ino == EXT2_BAD_INO in pass1 and pass1b made things a bit better. >>>> >>>> 3) e2fsck using a backup superblock >>>> ext2fs_check_desc: Corrupt group descriptor: bad block for inode table >>>> e2fsck: Group descriptors look bad... trying backup blocks... >>>> This is bad, as it means using a superblock that has not been updated with the +4TB. Consequently it gets the location of the first block group wrong, or at the very least the first inode table that houses the root inode. >>>> Forcing it to use the master superblock again makes things a bit better. >>>> >>>> I have some logs from various e2fsck runs with various amounts of hacks applied if they are of any interest to developers? I will also likely have the filesystem in this state for a week or two more if any other information I can extract is of interest to figure out what made resize2fs screw things up. >>>> >>>> >>>> >>>> In the end, the only actual change I have made to the filesystem to make it mountable is that I borrowed a root inode from a different filesystem and updated the i_block pointer to point to the extent tree corresponding to the root inode of my broken filesystem which was quite easy to find by just looking for the string "lost+found". >>>> >>>> # mount -o ro,noload /dev/md0 /mnt/loop >>>> [2815465.034803] EXT4-fs (md0): mounted filesystem without journal. Opts: noload >>>> >>>> # df -h /dev/md0 >>>> Filesystem Size Used Avail Use% Mounted on >>>> /dev/md0 22T -382T 404T - /mnt/loop >>>> >>>> Uh oh, does not look to good.. But hey, doing some checks on the data contents and so far results are very promising. An "ls /" looks good and so does a lot of the data that I can verify checksums on, checks are still running... >>>> >>>> I really do not know how to move on with trying to repair the filesystem with e2fsck. I do not feel brave enough to let it run r/w on the given how many hacks that I consider very dirty were required to even get it this far. At this point letting it make changes to the filesystem may actually make it worse so I see no other way forward than extracting all the contents and recreating the filesystem from scratch. >>>> >>>> Question is though, what is the recommended way to create the filesystem? 64bit is clearly necessary, but what about the other feature flags like flex_bg/meta_bg/resize_inode...? I do not care much about slight gains in performance, robustness is more important, and that it can be resized in the future. >>>> >>>> Only online resize from now on, never offlline, I learned that lesson... >>>> >>>> Will it be possible to expand from 24 TB to 28 TB online? >>>> >>>> thanks, >>>> -johan >>>> >>>> >>>>> On 2015-08-13 20:12, Johan Harvyl wrote: >>>>>> On 2015-08-13 15:27, Theodore Ts'o wrote: >>>>>> On Thu, Aug 13, 2015 at 12:00:50AM +0200, Johan Harvyl wrote: >>>>>> >>>>>>>> I'm not aware of any offline resize with 1.42.13, but it sounds like >>>>>>>> you were originally using mke2fs and resize2fs 1.42.10, which did have >>>>>>>> some bugs, and so the question is what sort of might it might have >>>>>>>> left things. >>>>>>> What kind of bugs are we talking about, mke2fs? resize2fs? e2fsck? Any >>>>>>> specific commits of interest? >>>>>> I suspect it was caused by a bug in resize2fs 1.42.10. The problem is >>>>>> that off-line resize2fs is much more powerful; it can handle moving >>>>>> file system metadata blocks around, so it can grow file systems in >>>>>> cases which aren't supported by online resize --- and it can shrink >>>>>> file systems when online resize doesn't support any kind of file >>>>>> system shrink. As such, the code is a lot more complicated, whereas >>>>>> the online resize code is much simpler, and ultimately, much more >>>>>> robust. >>>>> Understood, so would it have been possible to move from my 20 TB -> 24 TB fs with >>>>> online resize? I am confused by the threads I see on the net with regards to this. >>>>>>> Can you think of why it would zero out the first thousands of >>>>>>> inodes, like the root inode, lost+found and so on? I am thinking >>>>>>> that would help me assess the potential damage to the files. Could I >>>>>>> perhaps expect the same kind of zeroed out blocks at regular >>>>>>> intervals all over the device? >>>>>> I didn't realize that the first thousands of inodes had been zeroed; >>>>>> either you didn't mention this earier or I had missed that from your >>>>>> e-mail. I suspect the resize inode before the resize was pretty >>>>>> terribly corrupted, but in a way that e2fsck didn't complain. >>>>> >>>>> Hi, >>>>> >>>>> I may not have been clear on that it was not just the first handful of inodes. >>>>> >>>>> When I manually sampled some inodes with debugfs and a disk editor, the first group >>>>> I found valid inodes in was: >>>>> Group 48: block bitmap at 1572864, inode bitmap at 1572880, inode table at 1572896 >>>>> >>>>> With 512 inodes per group that would mean at least some 24k inodes are blanked out, >>>>> but I did not check them all, I just sampled groups manually so there could be some >>>>> valid in some of the groups below group 48 or a lot more invalid afterwards. >>>>> >>>>>> I'll have to try to reproduce the problem based how you originally >>>>>> created and grew the file system and see if I can somehow reproduce >>>>>> the problem. Obviously e2fsck and resize2fs should be changed to make >>>>>> this operation much more robust. If you can tell me the exact >>>>>> original size (just under 16TB is probably good enough, but if you >>>>>> know the exact starting size, that might be helpful), and then steps >>>>>> by which the file system was grown, and which version of e2fsprogs was >>>>>> installed at the time, that would be quite helpful. >>>>>> >>>>>> Thanks, >>>>>> >>>>>> - Ted >>>>> >>>>> Cool, I will try to go through its history in some detail below. >>>>> >>>>> If you have ideas on what I could look for, like ideas on if there is a particular periodicity >>>>> to the corruption I can write some python to explore such theories. >>>>> >>>>> >>>>> The filesystem was originally created with e2fsprogs 1.42.10-1 and most likely linux-image >>>>> 3.14 from Debian. >>>>> >>>>> # mkfs.ext4 /dev/md0 -i 262144 -m 0 -O 64bit >>>>> mke2fs 1.42.10 (18-May-2014) >>>>> Creating filesystem with 3906887168 4k blocks and 61045248 inodes >>>>> Filesystem UUID: 13c2eb37-e951-4ad1-b194-21f0880556db >>>>> Superblock backups stored on blocks: >>>>> 32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632, 2654208, >>>>> 4096000, 7962624, 11239424, 20480000, 23887872, 71663616, 78675968, >>>>> 102400000, 214990848, 512000000, 550731776, 644972544, 1934917632, >>>>> 2560000000, 3855122432 >>>>> >>>>> Allocating group tables: done >>>>> Writing inode tables: done >>>>> Creating journal (32768 blocks): done >>>>> Writing superblocks and filesystem accounting information: done >>>>> # >>>>> >>>>> It was expanded with 4 TB (another 976721792 4k blocks). Best I can tell from my logs this >>>>> was done with either e2fsprogs:amd64 1.42.12-1 or 1.42.12-1.1 (debian packages) and >>>>> Linux 3.16. Everything was running fine after this. >>>>> NOTE #1: It does *not* look like this filesystem was ever touched by resize2fs 1.42.10. >>>>> NOTE #2: The diff between debian packages 1.42.12-1 and 1.42.12-1.1 appear to be this: >>>>> 49d0fe2 libext2fs: fix potential buffer overflow in closefs() >>>>> >>>>> Then for the final 4 TB for a total of 5860330752 4k blocks which was done with >>>>> e2fsprogs:amd64 1.42.13-1 and Linux 4.0. This is where the: >>>>> "Should never happen: resize inode corrupt" >>>>> was seen. >>>>> >>>>> In both cases the same offline resize was done, with no exotic options: >>>>> # umount /dev/md0 >>>>> # fsck.ext4 -f /dev/md0 >>>>> # resize2fs /dev/md0 >>>>> >>>>> thanks, >>>>> -johan > > -- > To unsubscribe from this list: send the line "unsubscribe linux-ext4" in > the body of a message to majordomo@...r.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@...r.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Powered by blists - more mailing lists