lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20260112185001.GA450687@zen.localdomain>
Date: Mon, 12 Jan 2026 10:50:27 -0800
From: Boris Burkov <boris@....io>
To: Jiasheng Jiang <jiashengjiangcool@...il.com>
Cc: clm@...com, dsterba@...e.com, fdmanana@...nel.org,
	linux-btrfs@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH v3] btrfs: reset block group size class when reservations
 are freed

On Mon, Jan 12, 2026 at 03:02:48PM +0000, Jiasheng Jiang wrote:
> Differential analysis of block-group.c shows an inconsistency between
> btrfs_add_reserved_bytes() and btrfs_free_reserved_bytes().
> 
> When space is reserved, btrfs_use_block_group_size_class() is called to
> set a block group's size class, specializing it for a specific allocation
> size to reduce fragmentation. However, when these reservations are
> subsequently freed (e.g., due to an error or transaction abort),
> btrfs_free_reserved_bytes() fails to perform the corresponding cleanup.
> 
> This leads to a state where a block group remains stuck with a specific
> size class even if it contains no used or reserved bytes. While the
> impact depends on the workload, this stale state can cause
> find_free_extent to unnecessarily skip these block groups for mismatched
> size requests.
> In some cases, it may even trigger the allocation of new block groups if
> no matching or unassigned block groups are available.

Sorry I missed v1 and v2 of this, so I may be late with my questions.

This seems reasonable enough, but I am wondering why you are only doing
it for the error handling path.

We can also legitimately free the last extent in a block group which will
go through btrfs_update_block_group(). That will either mark it unused or
for async discard, which may eventually result in the bg getting cleaned up.
But in the meantime it is eligible to be used again so by the same
reasoning as this patch, it shouldn't have a size class.

That aside, I don't think this should be a huge issue for two reasons:
1. size classes are advisory and allocations won't fail because of them,
so I don't think your assertion about triggering new allocations is
accurate. (see LOOP_WRONG_SIZE_CLASS)
2. the bg is likely to end up getting used anyway, so it would take
pretty specific conditions to meaningfully "leak" one.

If this issue is something you have observed in practice rather than a
(valid) theoretical problem, I'd be curious to hear more about your
workload and the negative effects you observed.


One final point: this also makes me wonder if it would be beneficial to
merge some of the logic on hitting 0 used+reserved between
btrfs_update_block_group() and btrfs_free_reserve_bytes(). I imagine we
might be able to also miss a block group that needs to be marked unused
on this path.

Thanks,
Boris

> 
> Fix this by resetting the size class to BTRFS_BG_SZ_NONE in
> btrfs_free_reserved_bytes() when the block group becomes completely
> empty.
> 
> Fixes: 52bb7a2166af ("btrfs: introduce size class to block group allocator")
> Signed-off-by: Jiasheng Jiang <jiashengjiangcool@...il.com>
> ---
> Changelog:
> 
> v2 -> v3:
> 1. Corrected the "Fixes" tag to 52bb7a2166af.
> 2. Updated the commit message to reflect that the performance impact is workload-dependent.
> 3. Added mention that the issue can lead to unnecessary allocation of new block groups.
> 
> v1 -> v2:
> 1. Inlined btrfs_maybe_reset_size_class() function.
> 2. Moved check below the reserved bytes decrement in btrfs_free_reserved_bytes().
> ---
>  fs/btrfs/block-group.c | 6 ++++++
>  1 file changed, 6 insertions(+)
> 
> diff --git a/fs/btrfs/block-group.c b/fs/btrfs/block-group.c
> index 08b14449fabe..8339ad001d3f 100644
> --- a/fs/btrfs/block-group.c
> +++ b/fs/btrfs/block-group.c
> @@ -3867,6 +3867,12 @@ void btrfs_free_reserved_bytes(struct btrfs_block_group *cache, u64 num_bytes,
>  	spin_lock(&cache->lock);
>  	bg_ro = cache->ro;
>  	cache->reserved -= num_bytes;
> +
> +	if (btrfs_block_group_should_use_size_class(cache)) {
> +		if (cache->used == 0 && cache->reserved == 0)
> +			cache->size_class = BTRFS_BG_SZ_NONE;
> +	}
> +
>  	if (is_delalloc)
>  		cache->delalloc_bytes -= num_bytes;
>  	spin_unlock(&cache->lock);
> -- 
> 2.25.1
> 

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ