[<prev] [next>] [day] [month] [year] [list]
Message-ID: <AD880EB4-C72D-43BA-B06A-FB82764700BB@amazon.com>
Date: Mon, 28 Mar 2022 19:06:17 +0000
From: "Kiselev, Oleg" <okiselev@...zon.com>
To: linux-ext4 <linux-ext4@...r.kernel.org>
Subject: Resize of `-O bigalloc` filesystem uses a computation with O(n**2)
complexity
We are validating the use of `-O bigalloc` functionality and have run into problems with it.
This issue requires a large device to reproduce. On very large filesystems created with `-O bigalloc -C 16k` we see resize take progressively more time as the fs size grows. Here’s the time it takes to resize a filesystem created with the default 4K blocks:
===========
Size of logical volume dbdata01/lvdbdata01 changed from 37.79 TiB (9906476 extents) to 38.80 TiB (10171493 extents)
real 0m13.604s
user 0m0.013s
sys 0m9.549s
============
vs. the time it takes with 16K clusters:
============
Size of logical volume dbdata01/lvdbdata01 changed from 36.98 TiB (9694245 extents) to <38.71 TiB (10147477 extents).
real 178m22.590s
user 0m0.005s
sys 178m4.027s
============
We have tracked this down to the code in fs/ext4/super.c which does O(number_of_block_groups ** 2) iterations through the allocation groups while computing metadata overhead. The extraction of the code that shows this
```
3926 int ext4_calculate_overhead(struct super_block *sb)
3927 {
[...]
3953 for (i = 0; i < ngroups; i++) {
[...]
3956 blks = count_overhead(sb, i, buf); <<<<<
[...]
3961 }
[...]
3984 }
[...]
3865 static int count_overhead(struct super_block *sb, ext4_group_t grp,
3856 char *buf)
3857 {
[...]
3874 if (!ext4_has_feature_bigalloc(sb)) <<<<< non-bigalloc exits here!
3875 return (ext4_bg_has_super(sb, grp) + ext4_bg_num_gdb(sb, grp) +
3876 sbi->s_itb_per_group + 2);
[...]
3881 for (i = 0; i < ngroups; i++) { <<<<<
[...]
3916 }
[...]
3921 }
```
The comment on `count_overhead()` clearly states that the committers who wrote this code knew that this will be an n**2 complexity computation.
We are looking for ways to reduce the computation time, perhaps by skipping the already existing groups, whose overhead is already computed, and which are unlikely to have any of their metadata allocations in the groups we are adding.
Powered by blists - more mailing lists