linux-ext4 - Re: kernel bug during ext4_resize_fs going over 64TB on 4KB blocks - [was] corrupt filesystem, superblock/journal

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <38d1b479-efb2-33d1-f9c3-5c2c4a195d5d@uls.co.za>
Date:   Wed, 23 May 2018 16:12:29 +0200
From:   Jaco Kroon <jaco@....co.za>
To:     Jan Kara <jack@...e.cz>
Cc:     linux-ext4 <linux-ext4@...r.kernel.org>,
        Pieter Kruger <pieter@....co.za>, Theodore Ts'o <tytso@....edu>
Subject: Re: kernel bug during ext4_resize_fs going over 64TB on 4KB blocks -
 [was] corrupt filesystem, superblock/journal - fsck

Hi,

DISCLAIMER: My knowledge with respect to ext4 internals is limited, and
what I state here is based on what I've deduced from the code, and some
intuition, meaning some (or even all of it) may be completely incorrect.

Checking against kernel version 4.16, it looks like the
ext4_update_super() function in the kernel is responsible for updating
the superblock.  This code is potentially problematic:

    le32_add_cpu(&es->s_inodes_count, EXT4_INODES_PER_GROUP(sb) *
             flex_gd->count);

>From what I can tell there is absolutely no overflow / max protection
here.  Same calculation for s_free_inodes_count.  The value of inodes
per group is 512 (2^9).  The only two paths to ext4_flex_group_add (only
user of ext4_update_super) is ext4_group_add which always has
flex_gd->count == 1, and ext4_resize_fs, which calculates flex_gd->count
based on s_log_groups_per_flex (0, so flex_gd->count would be 1 still). 
So when we reach 8388608 (2^23) flex_gd's then we end up with 2^32 (0)
inodes, and we've got a corrupt filesystem.

This also implies that the s_inodes_count values should be a multiple of
inodes per group if I'm not mistaken.

And finally had the savvy to check the kernel logs, and this may shed
some light on the issues:

May 21 12:43:59 crowsnest kernel: EXT4-fs (dm-5): resizing filesystem
from 16508780544 to 17179869184 blocks
May 21 12:44:09 crowsnest kernel: EXT4-fs (dm-5): resized to 16541286400
blocks
May 21 12:44:19 crowsnest kernel: EXT4-fs (dm-5): resized to 16576937984
blocks
May 21 12:44:29 crowsnest kernel: EXT4-fs (dm-5): resized to 16611540992
blocks
May 21 12:44:39 crowsnest kernel: EXT4-fs (dm-5): resized to 16649289728
blocks
May 21 12:44:49 crowsnest kernel: EXT4-fs (dm-5): resized to 16687038464
blocks
May 21 12:44:59 crowsnest kernel: EXT4-fs (dm-5): resized to 16725311488
blocks
May 21 12:45:09 crowsnest kernel: EXT4-fs (dm-5): resized to 16763584512
blocks
May 21 12:45:19 crowsnest kernel: EXT4-fs (dm-5): resized to 16797138944
blocks
May 21 12:45:29 crowsnest kernel: EXT4-fs (dm-5): resized to 16839081984
blocks
May 21 12:45:39 crowsnest kernel: EXT4-fs (dm-5): resized to 16876830720
blocks
May 21 12:45:50 crowsnest kernel: EXT4-fs (dm-5): resized to 16917200896
blocks
May 21 12:46:00 crowsnest kernel: EXT4-fs (dm-5): resized to 16954425344
blocks
May 21 12:46:10 crowsnest kernel: EXT4-fs (dm-5): resized to 16989552640
blocks
May 21 12:46:20 crowsnest kernel: EXT4-fs (dm-5): resized to 17027825664
blocks
May 21 12:46:30 crowsnest kernel: EXT4-fs (dm-5): resized to 17065574400
blocks
May 21 12:46:40 crowsnest kernel: EXT4-fs (dm-5): resized to 17103847424
blocks
May 21 12:46:50 crowsnest kernel: EXT4-fs (dm-5): resized to 17143169024
blocks
May 21 12:47:00 crowsnest kernel: EXT4-fs error (device dm-5):
ext4_search_dir:1296: inode #304881794: block 1219511409: comm rsync:
bad entry in directory: inode out of bounds - offset=860(860),
inode=1455559466, rec_len=44, name_len=36
May 21 12:47:00 crowsnest kernel: EXT4-fs error (device dm-5):
htree_dirblock_to_tree:1006: inode #514662607: block 2058395547: comm
du: bad entry in directory: inode out of bounds - offset=0(0),
inode=514662607, rec_len=12, name_len=1
May 21 12:47:00 crowsnest kernel: EXT4-fs error (device dm-5) in
ext4_reserve_inode_write:5759: Corrupt filesystem
May 21 12:47:00 crowsnest kernel: EXT4-fs error (device dm-5) in
ext4_reserve_inode_write:5759: Corrupt filesystem
May 21 12:47:00 crowsnest kernel: EXT4-fs error (device dm-5) in
ext4_reserve_inode_write:5759: Corrupt filesystem
May 21 12:47:00 crowsnest kernel: EXT4-fs error (device dm-5) in
ext4_reserve_inode_write:5759: Corrupt filesystem
May 21 12:47:00 crowsnest kernel: EXT4-fs error (device dm-5) in
ext4_reserve_inode_write:5759: Corrupt filesystem
May 21 12:47:00 crowsnest kernel: EXT4-fs error (device dm-5):
htree_dirblock_to_tree:1006: inode #814645264: block 3258462095: comm
du: bad entry in directory: inode out of bounds - offset=0(0),
inode=814645264, rec_len=12, name_len=1
May 21 12:47:00 crowsnest kernel: EXT4-fs error (device dm-5) in
ext4_reserve_inode_write:5759: Corrupt filesystem
May 21 12:47:00 crowsnest kernel: EXT4-fs error (device dm-5) in
ext4_reserve_inode_write:5759: Corrupt filesystem
May 21 12:47:00 crowsnest kernel: EXT4-fs (dm-5): resized filesystem to
17179869184 blocks

The last post-resize output from the kernel is 2^34.

>From there on the number of errors just keeps going on inode out of
bounds, obviously since number of inodes is zero any check of the form
"inode_num <= le32_to_cpu(sb->s_inodes_count)" would fail...

So it really looks like this is something to do with the fact that we
sized to 64TB.

Kind Regards,
Jaco

On 23/05/2018 15:16, Jaco Kroon wrote:
> Hi Jan,
>
> On 23/05/2018 13:37, Jan Kara wrote:
>> Hi,
>> OK, so the Inode count is obviously wrong and the remaining errors are due
>> to that. Apparently the resize process has overflown the inode count to 0
>> (which is not that surprising since the number of inodes in your filesystem
>> would be 1<<32) - that needs fixing but let's first get your fs up and
>> running. I'm actually surprised that e2fsck did anything with the
>> filesystem because for me both 1.44.2 and 1.42.11 versions just exit after
>> printing the error about the corrupted superblock. Anyway what *could* fix
>> your problem is:
>>
>> debugfs -w -R 'ssv inodes_count 4294967295' /dev/lvm/home
>>
>> and then check with dumpe2fs that inode count indeed got fixed. Hope it
>> helps.
> I started to investigate the superblocks as well.  Using hexdump and dd
> ... scary.  Came to the same conclusion, tried to fix it by replacing it
> in the superblock using dd but that caused other issues so reverted it
> back to all zero.
>
> Also tried with debugfs but could not figure out how to use it so the
> above helped a lot thank you so much!  Unfortunately it doesn't help:
>
> crowsnest ~ # dumpe2fs /dev/lvm/home
> dumpe2fs 1.44.2 (14-May-2018)
> dumpe2fs: The ext2 superblock is corrupt while trying to open /dev/lvm/home
> Couldn't find valid filesystem superblock.
>
> fsck and debugfs also now fails, managed to revert that using:
>
> crowsnest ~ # echo -ne "\x00\x00\x00\x00" | dd of=/dev/lvm/home bs=4
> count=1 seek=256 conv=notrunc
> 1+0 records in
> 1+0 records out
> 4 bytes copied, 0.0213468 s, 0.2 kB/s
>
> And now we're back to where we started.  So I'm contemplating if 2^32-1
> is not perhaps an explicitly invalid value, but I've tried 2^32-2
> (4294967294) as well, same result.
>
>  Busy trying to check the e2fsck source files.  There are quite a few
> things that can go wrong during ext2fs_open2() and it's unclear what
> exactly is going wrong here.  Looks like I may have to modify the code
> to get the error value ...
>
> Since it happened during (directly after?) resize2fs we are actually
> thinking potential kernel bug.  Original FS size was 61TB and upsized to
> exactly 64TB.  In terms of 4096KB blocks that's EXACTLY 2^34 blocks, so
> I also aim to look at the kernel sources there, but as you say - first
> we need to get the filesystem up.
>
> Kind Regards,
> Jaco
>
>
>
>