lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <xmhuzjcgujdvmgmnc3mfd45txehmq73fiyg32vr6h7ldznctlq@rosxe25scojb>
Date: Fri, 27 Jun 2025 20:31:35 +0200
From: Jan Kara <jack@...e.cz>
To: Baokun Li <libaokun1@...wei.com>
Cc: linux-ext4@...r.kernel.org, tytso@....edu, jack@...e.cz, 
	adilger.kernel@...ger.ca, ojaswin@...ux.ibm.com, linux-kernel@...r.kernel.org, 
	yi.zhang@...wei.com, yangerkun@...wei.com
Subject: Re: [PATCH v2 04/16] ext4: utilize multiple global goals to reduce
 contention

On Mon 23-06-25 15:32:52, Baokun Li wrote:
> When allocating data blocks, if the first try (goal allocation) fails and
> stream allocation is on, it tries a global goal starting from the last
> group we used (s_mb_last_group). This helps cluster large files together
> to reduce free space fragmentation, and the data block contiguity also
> accelerates write-back to disk.
> 
> However, when multiple processes allocate blocks, having just one global
> goal means they all fight over the same group. This drastically lowers
> the chances of extents merging and leads to much worse file fragmentation.
> 
> To mitigate this multi-process contention, we now employ multiple global
> goals, with the number of goals being the CPU count rounded up to the
> nearest power of 2. To ensure a consistent goal for each inode, we select
> the corresponding goal by taking the inode number modulo the total number
> of goals.
> 
> Performance test data follows:
> 
> Test: Running will-it-scale/fallocate2 on CPU-bound containers.
> Observation: Average fallocate operations per container per second.
> 
>                    | Kunpeng 920 / 512GB -P80|  AMD 9654 / 1536GB -P96 |
>  Disk: 960GB SSD   |-------------------------|-------------------------|
>                    | base  |    patched      | base  |    patched      |
> -------------------|-------|-----------------|-------|-----------------|
> mb_optimize_scan=0 | 7612  | 19699 (+158%)   | 21647 | 53093 (+145%)   |
> mb_optimize_scan=1 | 7568  | 9862  (+30.3%)  | 9117  | 14401 (+57.9%)  |
> 
> Signed-off-by: Baokun Li <libaokun1@...wei.com>

...

> +/*
> + * Number of mb last groups
> + */
> +#ifdef CONFIG_SMP
> +#define MB_LAST_GROUPS roundup_pow_of_two(nr_cpu_ids)
> +#else
> +#define MB_LAST_GROUPS 1
> +#endif
> +

I think this is too aggressive. nr_cpu_ids is easily 4096 or similar for
distribution kernels (it is just a theoretical maximum for the number of
CPUs the kernel can support) which seems like far too much for small
filesystems with say 100 block groups. I'd rather pick the array size like:

min(num_possible_cpus(), sbi->s_groups_count/4)

to

a) don't have too many slots so we still concentrate big allocations in
somewhat limited area of the filesystem (a quarter of block groups here).

b) have at most one slot per CPU the machine hardware can in principle
support.

								Honza
-- 
Jan Kara <jack@...e.com>
SUSE Labs, CR

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ