[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <7e4b1bc0-e5a2-e180-592d-8d61e90d9cf8@uls.co.za>
Date: Tue, 24 Jul 2018 17:00:04 +0200
From: Jaco Kroon <jaco@....co.za>
To: Jan Kara <jack@...e.cz>, linux-ext4 <linux-ext4@...r.kernel.org>
Cc: Theodore Ts'o <tytso@....edu>
Subject: allowing ext4 file systems that wrapped inode count to continue
working
Hi,
Related to https://www.spinics.net/lists/linux-ext4/msg61075.html (and
possibly the cause of the the work from Jan in that patch series).
I have a 64TB (exactly) filesystem.
Filesystem OS type: Linux
Inode count: 4294967295
Block count: 17179869184
Reserved block count: 689862348
Free blocks: 16910075355
Free inodes: 4294966285
First block: 0
Block size: 4096
Fragment size: 4096
Group descriptor size: 64
Blocks per group: 32768
Fragments per group: 32768
Inodes per group: 8192
Inode blocks per group: 512
RAID stride: 128
RAID stripe width: 128
First meta block group: 1152
Flex block group size: 16
Note that in the above Inode count == 2^32-1 instead of the expected 2^32.
This results in the correct inode count being exactly 2^32 (which
overflows to 0). A kernel bug (fixed by Jan) allowed this overflow in
the first place.
I'm busy trying to write a patch for e2fsck that would allow it to (on
top of the referenced series by Jan) enable fsck to at least clear the
filesystem from other errors where currently if I hack the inode count
to ~0U fsck, tune2fs and friends fail.
With the attached patch (sorry, Thunderbird breaks my inlining of
patches) tune2fs operates (-l at least) as expected, and fsck gets to
pass5 where it segfaults with the following stack trace (compiled with -O0):
/dev/exp/exp contains a file system with errors, check forced.
Pass 1: Checking inodes, blocks, and sizes
Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Pass 5: Checking group summary information
Program received signal SIGSEGV, Segmentation fault.
0x00005555555ac8d1 in ext2fs_bg_flags_test (fs=0x555555811e90,
group=552320, bg_flag=1)
at blknum.c:445
445 return gdp->bg_flags & bg_flag;
(gdb) bt
#0 0x00005555555ac8d1 in ext2fs_bg_flags_test (fs=0x555555811e90,
group=552320, bg_flag=1)
at blknum.c:445
#1 0x000055555558c343 in check_inode_bitmaps (ctx=0x5555558112b0) at
pass5.c:759
#2 0x000055555558a251 in e2fsck_pass5 (ctx=0x5555558112b0) at pass5.c:57
#3 0x000055555556fb48 in e2fsck_run (ctx=0x5555558112b0) at e2fsck.c:249
#4 0x000055555556e849 in main (argc=5, argv=0x7fffffffdfe8) at unix.c:1859
(gdb) print *gdp
$1 = {bg_block_bitmap = 528400, bg_inode_bitmap = 0, bg_inode_table =
528456,
bg_free_blocks_count = 0, bg_free_inodes_count = 0, bg_used_dirs_count
= 4000, bg_flags = 8,
bg_exclude_bitmap_lo = 0, bg_block_bitmap_csum_lo = 0,
bg_inode_bitmap_csum_lo = 8,
bg_itable_unused = 0, bg_checksum = 0, bg_block_bitmap_hi = 528344,
bg_inode_bitmap_hi = 0,
bg_inode_table_hi = 528512, bg_free_blocks_count_hi = 0,
bg_free_inodes_count_hi = 0,
bg_used_dirs_count_hi = 4280, bg_itable_unused_hi = 8,
bg_exclude_bitmap_hi = 0,
bg_block_bitmap_csum_hi = 0, bg_inode_bitmap_csum_hi = 0, bg_reserved = 0}
... so I'm not sure why it even segfaults. gdb can retrieve a value of
8 for bg_flags ... and yet, if the code does that it segfaults. So not
sure what the discrepancy is there - probably a misunderstanding of
what's going wrong, but the only thing I can see that can segfault is
the gdp dereference, and since that seems to be a valid pointer ...
I am not sure if this is a separate issue, or due to me tampering with
the inode counter in the way that I am (I have to assume the latter).
For testing I created a thin volume (1TB) in a separate environment,
where I created a 16TB filesystem initially, and then expanded that to
64TB, resulting in exactly the same symptoms we saw in production
environment. I created a thousand empty files in the root folder. The
filesystem is consuming 100GB on-disk currently in the thin volume.
Note that group=552320 > 524288 (17179869184 / 32768).
Regarding further expansion, would appreciate some advise, there are two
(three) possible options that I could come up with:
1. Find a way to reduce the number of inodes per group (say to 4096,
which would require re-allocating all inodes >= 2^31 to inodes <2^31).
2. Allow to add additional blocks to the filesystem, without adding
additional inodes.
(3. Find some free space, create a new filesystem, and iteratively move
data from the one to the other, shrinking and growing the filesystems as
per progress - will never be able to move more data that what is
curently available on the system, around 4TB in my case, so will take a
VERY long time).
I'm currently aiming for option 2 since that looks to be the simplest.
Simply allow overflow to happen, but don't allocate additional inodes if
number of inodes is already ~0U.
Kind Regards,
Jaco
View attachment "0001-Allow-opening-a-filesystem-with-maxed-out-inode-coun.patch" of type "text/x-patch" (2118 bytes)
Powered by blists - more mailing lists