[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <77077598-45d6-43dd-90a0-f3668a27ca15@huawei.com>
Date: Mon, 30 Jun 2025 14:50:30 +0800
From: Baokun Li <libaokun1@...wei.com>
To: Jan Kara <jack@...e.cz>
CC: <linux-ext4@...r.kernel.org>, <tytso@....edu>, <adilger.kernel@...ger.ca>,
<ojaswin@...ux.ibm.com>, <linux-kernel@...r.kernel.org>,
<yi.zhang@...wei.com>, <yangerkun@...wei.com>, Baokun Li
<libaokun1@...wei.com>
Subject: Re: [PATCH v2 04/16] ext4: utilize multiple global goals to reduce
contention
On 2025/6/28 2:31, Jan Kara wrote:
> On Mon 23-06-25 15:32:52, Baokun Li wrote:
>> When allocating data blocks, if the first try (goal allocation) fails and
>> stream allocation is on, it tries a global goal starting from the last
>> group we used (s_mb_last_group). This helps cluster large files together
>> to reduce free space fragmentation, and the data block contiguity also
>> accelerates write-back to disk.
>>
>> However, when multiple processes allocate blocks, having just one global
>> goal means they all fight over the same group. This drastically lowers
>> the chances of extents merging and leads to much worse file fragmentation.
>>
>> To mitigate this multi-process contention, we now employ multiple global
>> goals, with the number of goals being the CPU count rounded up to the
>> nearest power of 2. To ensure a consistent goal for each inode, we select
>> the corresponding goal by taking the inode number modulo the total number
>> of goals.
>>
>> Performance test data follows:
>>
>> Test: Running will-it-scale/fallocate2 on CPU-bound containers.
>> Observation: Average fallocate operations per container per second.
>>
>> | Kunpeng 920 / 512GB -P80| AMD 9654 / 1536GB -P96 |
>> Disk: 960GB SSD |-------------------------|-------------------------|
>> | base | patched | base | patched |
>> -------------------|-------|-----------------|-------|-----------------|
>> mb_optimize_scan=0 | 7612 | 19699 (+158%) | 21647 | 53093 (+145%) |
>> mb_optimize_scan=1 | 7568 | 9862 (+30.3%) | 9117 | 14401 (+57.9%) |
>>
>> Signed-off-by: Baokun Li <libaokun1@...wei.com>
> ...
>
>> +/*
>> + * Number of mb last groups
>> + */
>> +#ifdef CONFIG_SMP
>> +#define MB_LAST_GROUPS roundup_pow_of_two(nr_cpu_ids)
>> +#else
>> +#define MB_LAST_GROUPS 1
>> +#endif
>> +
> I think this is too aggressive. nr_cpu_ids is easily 4096 or similar for
> distribution kernels (it is just a theoretical maximum for the number of
> CPUs the kernel can support)
nr_cpu_ids is generally equal to num_possible_cpus(). Only when
CONFIG_FORCE_NR_CPUS is enabled will nr_cpu_ids be set to NR_CPUS,
which represents the maximum number of supported CPUs.
> which seems like far too much for small
> filesystems with say 100 block groups.
It does make sense.
> I'd rather pick the array size like:
>
> min(num_possible_cpus(), sbi->s_groups_count/4)
>
> to
>
> a) don't have too many slots so we still concentrate big allocations in
> somewhat limited area of the filesystem (a quarter of block groups here).
>
> b) have at most one slot per CPU the machine hardware can in principle
> support.
>
> Honza
You're right, we should consider the number of block groups when setting
the number of global goals.
However, a server's rootfs can often be quite small, perhaps only tens of
GBs, while having many CPUs. In such cases, sbi->s_groups_count / 4 might
still limit the filesystem's scalability. Furthermore, after supporting
LBS, the number of block groups will sharply decrease.
How about we directly use sbi->s_groups_count (which would effectively be
min(num_possible_cpus(), sbi->s_groups_count)) instead? This would also
avoid zero values.
Cheers,
Baokun
Powered by blists - more mailing lists