lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <B9C846A3-35D0-480C-9888-3F46E8A9C6A5@dilger.ca>
Date:	Wed, 16 Sep 2015 19:21:59 -0600
From:	Andreas Dilger <adilger@...ger.ca>
To:	Johan Harvyl <johan@...vyl.se>
Cc:	Theodore Ts'o <tytso@....edu>,
	"linux-ext4@...r.kernel.org" <linux-ext4@...r.kernel.org>
Subject: Re: resize2fs: Should never happen: resize inode corrupt! - lost key inodes

If you add "-b 1024" to the mke2fs command line to use 1KB instead of 4KB blocks, and reduce the sizes by a factor of 4 does the problem still happen? That would make it easier for someone else to test, since it would only need a 4-5TB disk instead of a 19Tb array. 

Cheers, Andreas

> On Sep 15, 2015, at 11:55, Johan Harvyl <johan@...vyl.se> wrote:
> 
> I have now been able to reproduce the issue that resize2fs corrupts at least the root, resize and journal
> inodes with versions 1.42.13 and the more recent commit 956b0f1 of e2fsprogs.
> 
> Note that older versions of e2fsprogs need *not* be involved, 1.42.13 and newer also have issues.
> 
> Please advice on things I can try to narrow down the root cause of what has to be an e2fsprogs bug. In
> particular it would be very useful to reproduce it faster, running through the mkfs and two resize steps
> takes around ten minutes so iterative testing is a slow and I do not really have much of clue what steps
> would be more likely to overwrite the inodes.
> 
> At some point I would like to return this array to service but I am not really comfortable creating a
> new ext4 filesystem on it without first understanding how it can become corrupted without even
> mounting the file system.
> 
> For 1.42.13:
> # mkfs.ext4 /dev/md0 -i 262144 -m 0 -O 64bit 15627548672k
> # resize2fs -p /dev/md0 19534435840k
> # resize2fs -p /dev/md0
> # e2fsck -fn /dev/md0
> e2fsck 1.42.13 (17-May-2015)
> ext2fs_check_desc: Corrupt group descriptor: bad block for inode table
> e2fsck: Group descriptors look bad... trying backup blocks...
> Superblock has an invalid journal (inode 8).
> Clear? no
> 
> e2fsck: Illegal inode number while checking ext3 journal for /dev/md0
> 
> /dev/md0: ********** WARNING: Filesystem still has errors **********
> 
> 
> or for 956b0f1:
> # MKE2FS_CONFIG=/root/elatest/out/etc/mke2fs.conf /root/elatest/out/sbin/mkfs.ext4 /dev/md0 -i 262144 -m 0 -O 64bit 15627548672k
> # MKE2FS_CONFIG=/root/elatest/out/etc/mke2fs.conf /root/elatest/out/sbin/resize2fs -p /dev/md0 19534435840k
> # MKE2FS_CONFIG=/root/elatest/out/etc/mke2fs.conf /root/elatest/out/sbin/resize2fs -p /dev/md0
> # MKE2FS_CONFIG=/root/elatest/out/etc/mke2fs.conf /root/elatest/out/sbin/e2fsck -fn /dev/md0
> e2fsck 1.43-WIP (18-May-2015)
> ext2fs_open2: Superblock checksum does not match superblock
> /root/elatest/out/sbin/e2fsck: Superblock invalid, trying backup blocks...
> Superblock has an invalid journal (inode 8).
> Clear? no
> 
> /root/elatest/out/sbin/e2fsck: Illegal inode number while checking ext3 journal for /dev/md0
> 
> /dev/md0: ********** WARNING: Filesystem still has errors **********
> 
> # MKE2FS_CONFIG=/root/elatest/out/etc/mke2fs.conf /root/elatest/out/sbin/debugfs -c /dev/md0
> debugfs 1.43-WIP (18-May-2015)
> /dev/md0: Superblock checksum does not match superblock while opening filesystem
> debugfs:  stat <2>
> stat: Filesystem not open
> 
> # debugfs -c /dev/md0
> debugfs 1.42.13 (17-May-2015)
> /dev/md0: catastrophic mode - not reading inode or group bitmaps
> debugfs:  stat <2>
> Inode: 2   Type: bad type    Mode:  0004   Flags: 0x1
> Generation: 1    Version: 0x00000001
> User:  9440   Group:     0   Size: 618659860
> File ACL: 1    Directory ACL: 0
> Links: 0   Blockcount: 724107776
> Fragment:  Address: 0    Number: 0    Size: 0
> ctime: 0x02008000 -- Sun Jan 24 18:46:40 1971
> atime: 0x24e000a0 -- Wed Aug  9 12:00:00 1989
> mtime: 0x00030000 -- Sat Jan  3 07:36:48 1970
> Size of extra inode fields: 6
> BLOCKS:
> (0):1, (6):618659845 .... and it goes on...
> 
>> On 2015-09-14 23:35, Johan Harvyl wrote:
>> In an attempt to further isolate what versions of e2fsprogs, at a commit level, that are
>> needed to reproduce the bad behavior I tried my own step-by-step, initially with a much
>> higher -i 16777216 to mkfs.ext4 in the hope that fewer inodes would make all the
>> operations run faster.
>> 
>> When I was unable to reproduce with -i 16777216 instead, I switched back to exactly
>> what I reproduced with the first time, and I *still* did not get the "Should never happen:
>> resize inode corrupt!".
>> 
>> The only reasonable explanation I can come up with to this is that something is not being
>> initialized properly that resize2fs expects to be initialized. I have no indications of any
>> issues with any hardware or the underlying md block.
>> 
>> What I did however notice is that I can have the same kind of filesystem corruption
>> *without* seeing the "Should never happen: resize inode corrupt!" message using the
>> following sequence, and this *is* reproducible one time after another:
>> 
>> # MKE2FS_CONFIG=/root/e10/out/etc/mke2fs.conf /root/e10/out/sbin/mkfs.ext4 /dev/md0 -i 262144 -m 0 -O 64bit 15627548672k
>> # e2fsck -fy /dev/md0 (using 1.42.13)
>> # resize2fs -p /dev/md0 19534435840k (using 1.42.13)
>> # resize2fs -p /dev/md0 (using 1.42.13)
>> # e2fsck -fn /dev/md0
>> e2fsck 1.42.13 (17-May-2015)
>> ext2fs_check_desc: Corrupt group descriptor: bad block for inode table
>> e2fsck: Group descriptors look bad... trying backup blocks...
>> Superblock has an invalid journal (inode 8).
>> Clear? no
>> 
>> e2fsck: Illegal inode number while checking ext3 journal for /dev/md0
>> 
>> At this point the root inode is also bad and this fails:
>> # mount /dev/md0 /mnt/loop -o ro,noload
>> mount: mount /dev/md0 on /mnt/loop failed: Stale file handle
>> [3766493.732188] EXT4-fs (md0): get root inode failed
>> [3766493.732190] EXT4-fs (md0): mount failed
>> 
>> Note that only versions 1.42.10 and 1.42.13 are involved now, 1.42.12 is not needed.
>> 
>> Kernel is the debian:
>> ii  linux-image-4.0.0-2-amd64      4.0.8-2 amd64 Linux 4.0 for 64-bit PCs
>> 
>> For the record I also tried a more recent e2fsprogs for the resize (instead of 1.42.13),
>> locally built from:
>> 956b0f1 Merge branch 'maint' into next
>> and I could still reproduce it on the first attempt.
>> 
>> More verbose logs follows.
>> 
>> Does anyone else have some kind of testbed to test the same sequence of commands?
>> 
>> ===
>> 
>> # MKE2FS_CONFIG=/root/e10/out/etc/mke2fs.conf /root/e10/out/sbin/mkfs.ext4 /dev/md0 -i 262144 -m 0 -O 64bit 15627548672k
>> mke2fs 1.42.10 (18-May-2014)
>> /dev/md0 contains a ext4 file system
>>        last mounted on Sun Sep 13 22:19:28 2015
>> Proceed anyway? (y,n) y
>> Creating filesystem with 3906887168 4k blocks and 61045248 inodes
>> Filesystem UUID: e263356e-4fe4-4e9b-bd0c-8edc2c411735
>> Superblock backups stored on blocks:
>>        32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632, 2654208,
>>        4096000, 7962624, 11239424, 20480000, 23887872, 71663616, 78675968,
>>        102400000, 214990848, 512000000, 550731776, 644972544, 1934917632,
>>        2560000000, 3855122432
>> 
>> Allocating group tables: done
>> Writing inode tables: done
>> Creating journal (32768 blocks): done
>> Writing superblocks and filesystem accounting information: done
>> 
>> # e2fsck -fy /dev/md0
>> e2fsck 1.42.13 (17-May-2015)
>> Pass 1: Checking inodes, blocks, and sizes
>> Pass 2: Checking directory structure
>> Pass 3: Checking directory connectivity
>> Pass 4: Checking reference counts
>> Pass 5: Checking group summary information
>> Free blocks count wrong (512088558484167, counted=3902749383).
>> Fix? yes
>> 
>> 
>> /dev/md0: ***** FILE SYSTEM WAS MODIFIED *****
>> /dev/md0: 11/61045248 files (0.0% non-contiguous), 4137785/3906887168 blocks
>> 
>> # resize2fs -p /dev/md0 19534435840k
>> resize2fs 1.42.13 (17-May-2015)
>> Resizing the filesystem on /dev/md0 to 4883608960 (4k) blocks.
>> Begin pass 2 (max = 6)
>> Relocating blocks XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
>> Begin pass 3 (max = 119229)
>> Scanning inode table XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
>> Begin pass 5 (max = 8)
>> Moving inode table XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
>> The filesystem on /dev/md0 is now 4883608960 (4k) blocks long.
>> 
>> # resize2fs -p /dev/md0
>> resize2fs 1.42.13 (17-May-2015)
>> Resizing the filesystem on /dev/md0 to 5860330752 (4k) blocks.
>> Begin pass 2 (max = 6)
>> Relocating blocks XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
>> Begin pass 3 (max = 149036)
>> Scanning inode table XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
>> Begin pass 5 (max = 14)
>> Moving inode table XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
>> The filesystem on /dev/md0 is now 5860330752 (4k) blocks long.
>> 
>> # e2fsck -fn /dev/md0
>> e2fsck 1.42.13 (17-May-2015)
>> ext2fs_check_desc: Corrupt group descriptor: bad block for inode table
>> e2fsck: Group descriptors look bad... trying backup blocks...
>> Superblock has an invalid journal (inode 8).
>> Clear? no
>> 
>> e2fsck: Illegal inode number while checking ext3 journal for /dev/md0
>> 
>>> On 2015-09-12 12:27, Johan Harvyl wrote:
>>> Hi,
>>> 
>>> I have now evacuated the data on the filesystem and I *did* manage to recreate the
>>> "Should never happen: resize inode corrupt!" using the versions of e2fsprogs I believe I was using at the time.
>>> 
>>> The vast majority of the data that I was able to checksum was ok.
>>> 
>>> For me I guess the way forward should be to recreate the fs with 1.42.13 and stick to online resize
>>> from now on, correct?
>>> 
>>> Are there any feature flags that I should not use when expanding file systems or any that I must use?
>>> 
>>> -johan
>>> 
>>> 
>>> Here is a step by step of what I did to reproduce
>>> 
>>> I have built the following two versions of e2fsprogs (configure, make, make install, nothing else):
>>> 421d693 (HEAD) libext2fs: fix potential buffer overflow in closefs()
>>> 6a3741a (tag: v1.42.12) Update release notes, etc. for final 1.42.12 release
>>> 
>>> 9779e29 (HEAD, tag: v1.42.10) Update release notes, etc. for final 1.42.10 release
>>> 
>>> ===
>>> 
>>> First build the fs with 1.42.10 with the exact number of blocks I originally had.
>>> 
>>> # MKE2FS_CONFIG=/root/e10/out/etc/mke2fs.conf /root/e10/out/sbin/mkfs.ext4 /dev/md0 -i 262144 -m 0 -O 64bit 15627548672k
>>> mke2fs 1.42.10 (18-May-2014)
>>> /dev/md0 contains a ext4 file system
>>>        created on Sat Sep 12 11:23:02 2015
>>> Proceed anyway? (y,n) y
>>> Creating filesystem with 3906887168 4k blocks and 61045248 inodes
>>> Filesystem UUID: d00e9e59-3756-4e59-9539-bc00fe2446b5
>>> Superblock backups stored on blocks:
>>>        32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632, 2654208,
>>>        4096000, 7962624, 11239424, 20480000, 23887872, 71663616, 78675968,
>>>        102400000, 214990848, 512000000, 550731776, 644972544, 1934917632,
>>>        2560000000, 3855122432
>>> 
>>> Allocating group tables: done
>>> Writing inode tables: done
>>> Creating journal (32768 blocks): done
>>> Writing superblocks and filesystem accounting information: done
>>> 
>>> From dumpe2fs I observe:
>>> 1) the fs features match what I had on my broken fs
>>> 2) the number of free blocks is 512088558484167 which is clearly wrong.
>>> 
>>> # e2fsck -fnv /dev/md0
>>> e2fsck 1.42.13 (17-May-2015)
>>> Pass 1: Checking inodes, blocks, and sizes
>>> Pass 2: Checking directory structure
>>> Pass 3: Checking directory connectivity
>>> Pass 4: Checking reference counts
>>> Pass 5: Checking group summary information
>>> Free blocks count wrong (512088558484167, counted=3902749383).
>>> Fix? no
>>> 
>>> So the initial fs created by 1.42.10 appear to be bad.
>>> 
>>> Filesystem volume name:   <none>
>>> Last mounted on:          <not available>
>>> Filesystem UUID: d00e9e59-3756-4e59-9539-bc00fe2446b5
>>> Filesystem magic number:  0xEF53
>>> Filesystem revision #:    1 (dynamic)
>>> Filesystem features:      has_journal ext_attr resize_inode dir_index filetype extent 64bit flex_bg sparse_super large_file huge_file uninit_bg dir_nlink extra_isize
>>> Filesystem flags:         signed_directory_hash
>>> Default mount options:    user_xattr acl
>>> Filesystem state:         clean
>>> Errors behavior:          Continue
>>> Filesystem OS type:       Linux
>>> Inode count:              61045248
>>> Block count:              3906887168
>>> Reserved block count:     0
>>> Free blocks:              512088558484167
>>> Free inodes:              61045237
>>> First block:              0
>>> Block size:               4096
>>> Fragment size:            4096
>>> Group descriptor size:    64
>>> Reserved GDT blocks:      185
>>> Blocks per group:         32768
>>> Fragments per group:      32768
>>> Inodes per group:         512
>>> Inode blocks per group:   32
>>> Flex block group size:    16
>>> Filesystem created:       Sat Sep 12 11:27:55 2015
>>> Last mount time:          n/a
>>> Last write time:          Sat Sep 12 11:27:55 2015
>>> Mount count:              0
>>> Maximum mount count:      -1
>>> Last checked:             Sat Sep 12 11:27:55 2015
>>> Check interval:           0 (<none>)
>>> Lifetime writes:          158 MB
>>> Reserved blocks uid:      0 (user root)
>>> Reserved blocks gid:      0 (group root)
>>> First inode:              11
>>> Inode size:               256
>>> Required extra isize:     28
>>> Desired extra isize:      28
>>> Journal inode:            8
>>> Default directory hash:   half_md4
>>> Directory Hash Seed: f252a723-7016-43d1-97f8-579062a215e1
>>> Journal backup:           inode blocks
>>> Journal features:         (none)
>>> Journal size:             128M
>>> Journal length:           32768
>>> Journal sequence:         0x00000001
>>> Journal start:            0
>>> 
>>> 
>>> 
>>> The next step is resizing + 4 TB with 1.42.12.
>>> # MKE2FS_CONFIG=/root/e12/out/etc/mke2fs.conf /root/e12/out/sbin/resize2fs -p /dev/md0 19534435840k
>>> resize2fs 1.42.12 (29-Aug-2014)
>>> <and nothing more>
>>> It did *not* print the "Resizing the filesystem on /dev/md0 to 4883608960 (4k) blocks." that it should have.
>>> 
>>> I let it run for 90+ minutes sampling CPU and IO usage with iotop from time to time. It was using more or less 100% CPU and no visible io.
>>> 
>>> So, I let e2fsck fix the free block count and re-did the resize:
>>> # e2fsck -f /dev/md0
>>> e2fsck 1.42.13 (17-May-2015)
>>> Pass 1: Checking inodes, blocks, and sizes
>>> Pass 2: Checking directory structure
>>> Pass 3: Checking directory connectivity
>>> Pass 4: Checking reference counts
>>> Pass 5: Checking group summary information
>>> Free blocks count wrong (512088558484167, counted=3902749383).
>>> Fix<y>? yes
>>> 
>>> /dev/md0: ***** FILE SYSTEM WAS MODIFIED *****
>>> /dev/md0: 11/61045248 files (0.0% non-contiguous), 4137785/3906887168 blocks
>>> 
>>> # MKE2FS_CONFIG=/root/e12/out/etc/mke2fs.conf /root/e12/out/sbin/resize2fs -p /dev/md0 19534435840k
>>> resize2fs 1.42.12 (29-Aug-2014)
>>> Resizing the filesystem on /dev/md0 to 4883608960 (4k) blocks.
>>> Begin pass 2 (max = 6)
>>> Relocating blocks XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
>>> Begin pass 3 (max = 119229)
>>> Scanning inode table XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
>>> Begin pass 5 (max = 8)
>>> Moving inode table XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
>>> The filesystem on /dev/md0 is now 4883608960 (4k) blocks long.
>>> 
>>> dumpe2fs 1.42.13 (17-May-2015)
>>> Filesystem volume name:   <none>
>>> Last mounted on:          <not available>
>>> Filesystem UUID: 159d3929-1842-4f8d-907f-7509c16f06df
>>> Filesystem magic number:  0xEF53
>>> Filesystem revision #:    1 (dynamic)
>>> Filesystem features:      has_journal ext_attr resize_inode dir_index filetype extent 64bit flex_bg sparse_super large_file huge_file uninit_bg dir_nlink extra_isize
>>> Filesystem flags:         signed_directory_hash
>>> Default mount options:    user_xattr acl
>>> Filesystem state:         clean
>>> Errors behavior:          Continue
>>> Filesystem OS type:       Linux
>>> Inode count:              76306432
>>> Block count:              4883608960
>>> Reserved block count:     0
>>> Free blocks:              4878450712
>>> Free inodes:              76306421
>>> First block:              0
>>> Block size:               4096
>>> Fragment size:            4096
>>> Group descriptor size:    64
>>> Blocks per group:         32768
>>> Fragments per group:      32768
>>> Inodes per group:         512
>>> Inode blocks per group:   32
>>> RAID stride:              32752
>>> Flex block group size:    16
>>> Filesystem created:       Sat Sep 12 11:41:10 2015
>>> Last mount time:          n/a
>>> Last write time:          Sat Sep 12 11:56:20 2015
>>> Mount count:              0
>>> Maximum mount count:      -1
>>> Last checked:             Sat Sep 12 11:49:28 2015
>>> Check interval:           0 (<none>)
>>> Lifetime writes:          279 MB
>>> Reserved blocks uid:      0 (user root)
>>> Reserved blocks gid:      0 (group root)
>>> First inode:              11
>>> Inode size:               256
>>> Required extra isize:     28
>>> Desired extra isize:      28
>>> Journal inode:            8
>>> Default directory hash:   half_md4
>>> Directory Hash Seed: feeea566-bb38-44c6-a4d5-f97aa78001d4
>>> Journal backup:           inode blocks
>>> Journal features:         (none)
>>> Journal size:             128M
>>> Journal length:           32768
>>> Journal sequence:         0x00000001
>>> Journal start:            0
>>> 
>>> Looking good so far, and now for the final resize to 24 TB using 1.42.13:
>>> # resize2fs -p /dev/md0
>>> resize2fs 1.42.13 (17-May-2015)
>>> Resizing the filesystem on /dev/md0 to 5860330752 (4k) blocks.
>>> Begin pass 2 (max = 6)
>>> Relocating blocks XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
>>> Begin pass 3 (max = 149036)
>>> Scanning inode table XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
>>> Begin pass 5 (max = 14)
>>> Moving inode table XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
>>> Should never happen: resize inode corrupt!
>>> 
>>> # dumpe2fs -h /dev/md0
>>> dumpe2fs 1.42.13 (17-May-2015)
>>> Filesystem volume name:   <none>
>>> Last mounted on:          <not available>
>>> Filesystem UUID: 159d3929-1842-4f8d-907f-7509c16f06df
>>> Filesystem magic number:  0xEF53
>>> Filesystem revision #:    1 (dynamic)
>>> Filesystem features:      has_journal ext_attr resize_inode dir_index filetype extent 64bit flex_bg sparse_super large_file huge_file uninit_bg dir_nlink extra_isize
>>> Filesystem flags:         signed_directory_hash
>>> Default mount options:    user_xattr acl
>>> Filesystem state:         clean with errors
>>> Errors behavior:          Continue
>>> Filesystem OS type:       Linux
>>> Inode count:              91568128
>>> Block count:              5860330752
>>> Reserved block count:     0
>>> Free blocks:              5853069550
>>> Free inodes:              91568117
>>> First block:              0
>>> Block size:               4096
>>> Fragment size:            4096
>>> Group descriptor size:    64
>>> Blocks per group:         32768
>>> Fragments per group:      32768
>>> Inodes per group:         512
>>> Inode blocks per group:   32
>>> RAID stride:              32752
>>> Flex block group size:    16
>>> Filesystem created:       Sat Sep 12 11:41:10 2015
>>> Last mount time:          n/a
>>> Last write time:          Sat Sep 12 12:03:55 2015
>>> Mount count:              0
>>> Maximum mount count:      -1
>>> Last checked:             Sat Sep 12 11:49:28 2015
>>> Check interval:           0 (<none>)
>>> Lifetime writes:          279 MB
>>> Reserved blocks uid:      0 (user root)
>>> Reserved blocks gid:      0 (group root)
>>> First inode:              11
>>> Inode size:               256
>>> Required extra isize:     28
>>> Desired extra isize:      28
>>> Journal inode:            8
>>> Default directory hash:   half_md4
>>> Directory Hash Seed: feeea566-bb38-44c6-a4d5-f97aa78001d4
>>> Journal backup:           inode blocks
>>> Journal superblock magic number invalid!
>>> 
>>> 
>>>> On 2015-09-04 00:16, Johan Harvyl wrote:
>>>> Hello again,
>>>> 
>>>> I finally got around to dig some more into this and made what I consider some good progress as I am now able to mount the filesystem read-only so I thought I would update this thread a bit.
>>>> 
>>>> Short one sentence recap since it's been a while since the original post: I am trying to recover a filesystem that was quite badly damaged by an offline resize2fs of a fairly modern ext4fs from 20 TB to 24 TB.
>>>> 
>>>> I spent a lot of time trying to get something meaningful out of e2fsck/debugfs and learned quite a bit in the process and I would like to briefly share some observations.
>>>> 
>>>> 1) The first hurdle running e2fsck -fnv is that the "Superblock has an invalid journal (inode 8)" is considered fatal and cannot be fixed, at least not in r/o mode so e2fsck just stops, this check needed to go away.
>>>> 
>>>> 2) e2fsck gets utterly confused by the "bad block inode" that incorrectly gets identified as having something worth looking at and spends days iterating through blocks (before I cancelled it). Removing handling if ino == EXT2_BAD_INO in pass1 and pass1b made things a bit better.
>>>> 
>>>> 3) e2fsck using a backup superblock
>>>> ext2fs_check_desc: Corrupt group descriptor: bad block for inode table
>>>> e2fsck: Group descriptors look bad... trying backup blocks...
>>>> This is bad, as it means using a superblock that has not been updated with the +4TB. Consequently it gets the location of the first block group wrong, or at the very least the first inode table that houses the root inode.
>>>> Forcing it to use the master superblock again makes things a bit better.
>>>> 
>>>> I have some logs from various e2fsck runs with various amounts of hacks applied if they are of any interest to developers? I will also likely have the filesystem in this state for a week or two more if any other information I can extract is of interest to figure out what made resize2fs screw things up.
>>>> 
>>>> 
>>>> 
>>>> In the end, the only actual change I have made to the filesystem to make it mountable is that I borrowed a root inode from a different filesystem and updated the i_block pointer to point to the extent tree corresponding to the root inode of my broken filesystem which was quite easy to find by just looking for the string "lost+found".
>>>> 
>>>> # mount -o ro,noload /dev/md0 /mnt/loop
>>>> [2815465.034803] EXT4-fs (md0): mounted filesystem without journal. Opts: noload
>>>> 
>>>> # df -h /dev/md0
>>>> Filesystem      Size  Used Avail Use% Mounted on
>>>> /dev/md0         22T -382T  404T    - /mnt/loop
>>>> 
>>>> Uh oh, does not look to good.. But hey, doing some checks on the data contents and so far results are very promising. An "ls /" looks good and so does a lot of the data that I can verify checksums on, checks are still running...
>>>> 
>>>> I really do not know how to move on with trying to repair the filesystem with e2fsck. I do not feel brave enough to let it run r/w on the given how many hacks that I consider very dirty were required to even get it this far. At this point letting it make changes to the filesystem may actually make it worse so I see no other way forward than extracting all the contents and recreating the filesystem from scratch.
>>>> 
>>>> Question is though, what is the recommended way to create the filesystem? 64bit is clearly necessary, but what about the other feature flags like flex_bg/meta_bg/resize_inode...? I do not care much about slight gains in performance, robustness is more important, and that it can be resized in the future.
>>>> 
>>>> Only online resize from now on, never offlline, I learned that lesson...
>>>> 
>>>> Will it be possible to expand from 24 TB to 28 TB online?
>>>> 
>>>> thanks,
>>>> -johan
>>>> 
>>>> 
>>>>> On 2015-08-13 20:12, Johan Harvyl wrote:
>>>>>> On 2015-08-13 15:27, Theodore Ts'o wrote:
>>>>>> On Thu, Aug 13, 2015 at 12:00:50AM +0200, Johan Harvyl wrote:
>>>>>> 
>>>>>>>> I'm not aware of any offline resize with 1.42.13, but it sounds like
>>>>>>>> you were originally using mke2fs and resize2fs 1.42.10, which did have
>>>>>>>> some bugs, and so the question is what sort of might it might have
>>>>>>>> left things.
>>>>>>> What kind of bugs are we talking about, mke2fs? resize2fs? e2fsck? Any
>>>>>>> specific commits of interest?
>>>>>> I suspect it was caused by a bug in resize2fs 1.42.10. The problem is
>>>>>> that off-line resize2fs is much more powerful; it can handle moving
>>>>>> file system metadata blocks around, so it can grow file systems in
>>>>>> cases which aren't supported by online resize --- and it can shrink
>>>>>> file systems when online resize doesn't support any kind of file
>>>>>> system shrink.  As such, the code is a lot more complicated, whereas
>>>>>> the online resize code is much simpler, and ultimately, much more
>>>>>> robust.
>>>>> Understood, so would it have been possible to move from my 20 TB -> 24 TB fs with
>>>>> online resize? I am confused by the threads I see on the net with regards to this.
>>>>>>> Can you think of why it would zero out the first thousands of
>>>>>>> inodes, like the root inode, lost+found and so on? I am thinking
>>>>>>> that would help me assess the potential damage to the files. Could I
>>>>>>> perhaps expect the same kind of zeroed out blocks at regular
>>>>>>> intervals all over the device?
>>>>>> I didn't realize that the first thousands of inodes had been zeroed;
>>>>>> either you didn't mention this earier or I had missed that from your
>>>>>> e-mail.  I suspect the resize inode before the resize was pretty
>>>>>> terribly corrupted, but in a way that e2fsck didn't complain.
>>>>> 
>>>>> Hi,
>>>>> 
>>>>> I may not have been clear on that it was not just the first handful of inodes.
>>>>> 
>>>>> When I manually sampled some inodes with debugfs and a disk editor, the first group
>>>>> I found valid inodes in was:
>>>>> Group 48: block bitmap at 1572864, inode bitmap at 1572880, inode table at 1572896
>>>>> 
>>>>> With 512 inodes per group that would mean at least some 24k inodes are blanked out,
>>>>> but I did not check them all, I just sampled groups manually so there could be some
>>>>> valid in some of the groups below group 48 or a lot more invalid afterwards.
>>>>> 
>>>>>> I'll have to try to reproduce the problem based how you originally
>>>>>> created and grew the file system and see if I can somehow reproduce
>>>>>> the problem.  Obviously e2fsck and resize2fs should be changed to make
>>>>>> this operation much more robust.  If you can tell me the exact
>>>>>> original size (just under 16TB is probably good enough, but if you
>>>>>> know the exact starting size, that might be helpful), and then steps
>>>>>> by which the file system was grown, and which version of e2fsprogs was
>>>>>> installed at the time, that would be quite helpful.
>>>>>> 
>>>>>> Thanks,
>>>>>> 
>>>>>>                        - Ted
>>>>> 
>>>>> Cool, I will try to go through its history in some detail below.
>>>>> 
>>>>> If you have ideas on what I could look for, like ideas on if there is a particular periodicity
>>>>> to the corruption I can write some python to explore such theories.
>>>>> 
>>>>> 
>>>>> The filesystem was originally created with e2fsprogs 1.42.10-1 and most likely linux-image
>>>>> 3.14 from Debian.
>>>>> 
>>>>> # mkfs.ext4 /dev/md0 -i 262144 -m 0 -O 64bit
>>>>> mke2fs 1.42.10 (18-May-2014)
>>>>> Creating filesystem with 3906887168 4k blocks and 61045248 inodes
>>>>> Filesystem UUID: 13c2eb37-e951-4ad1-b194-21f0880556db
>>>>> Superblock backups stored on blocks:
>>>>>        32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632, 2654208,
>>>>>        4096000, 7962624, 11239424, 20480000, 23887872, 71663616, 78675968,
>>>>>        102400000, 214990848, 512000000, 550731776, 644972544, 1934917632,
>>>>>        2560000000, 3855122432
>>>>> 
>>>>> Allocating group tables: done
>>>>> Writing inode tables: done
>>>>> Creating journal (32768 blocks): done
>>>>> Writing superblocks and filesystem accounting information: done
>>>>> #
>>>>> 
>>>>> It was expanded with 4 TB (another 976721792 4k blocks). Best I can tell from my logs this
>>>>> was done with either e2fsprogs:amd64 1.42.12-1 or 1.42.12-1.1 (debian packages) and
>>>>> Linux 3.16. Everything was running fine after this.
>>>>> NOTE #1: It does *not* look like this filesystem was ever touched by resize2fs 1.42.10.
>>>>> NOTE #2: The diff between debian packages 1.42.12-1 and 1.42.12-1.1 appear to be this:
>>>>> 49d0fe2 libext2fs: fix potential buffer overflow in closefs()
>>>>> 
>>>>> Then for the final 4 TB for a total of 5860330752 4k blocks which was done with
>>>>> e2fsprogs:amd64 1.42.13-1 and Linux 4.0. This is where the:
>>>>> "Should never happen: resize inode corrupt"
>>>>> was seen.
>>>>> 
>>>>> In both cases the same offline resize was done, with no exotic options:
>>>>> # umount /dev/md0
>>>>> # fsck.ext4 -f /dev/md0
>>>>> # resize2fs /dev/md0
>>>>> 
>>>>> thanks,
>>>>> -johan
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to majordomo@...r.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ