linux-ext4 - Re: [PATCH] ext4: fix dirtyclusters double decrement on fs shutdown

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <ef906e19-04b9-4793-998e-81c34ebf9126@huawei.com>
Date: Sat, 13 Dec 2025 09:46:23 +0800
From: Baokun Li <libaokun1@...wei.com>
To: Brian Foster <bfoster@...hat.com>
CC: <linux-ext4@...r.kernel.org>
Subject: Re: [PATCH] ext4: fix dirtyclusters double decrement on fs shutdown

Hi Brian,

Thanks for the patch.

On 2025-12-12 23:47, Brian Foster wrote:
> fstests test generic/388 occasionally reproduces a warning in
> ext4_put_super() associated with the dirty clusters count:
>
>   WARNING: CPU: 7 PID: 76064 at fs/ext4/super.c:1324 ext4_put_super+0x48c/0x590 [ext4]
>
> Tracing the failure shows that the warning fires due to an
> s_dirtyclusters_counter value of -1. IOW, this appears to be a
> spurious decrement as opposed to some sort of leak. Further tracing
> of the dirty cluster count deltas and an LLM scan of the resulting
> output identified the cause as a double decrement in the error path
> between ext4_mb_mark_diskspace_used() and the caller
> ext4_mb_new_blocks().
>
> First, note that generic/388 is a shutdown vs. fsstress test and so
> produces a random set of operations and shutdown injections. In the
> problematic case, the shutdown triggers an error return from the
> ext4_handle_dirty_metadata() call(s) made from
> ext4_mb_mark_context(). The changed value is non-zero at this point,
> so ext4_mb_mark_diskspace_used() does not exit after the error
> bubbles up from ext4_mb_mark_context(). Instead, the former
> decrements both cluster counters and returns the error up to
> ext4_mb_new_blocks(). The latter falls into the !ar->len out path
> which decrements the dirty clusters counter a second time, creating
> the inconsistency.
>
> AFAICT the solution here is to exit immediately from
> ext4_mb_mark_diskspace_used() on error, regardless of the changed
> value. This leaves the caller responsible for clearing the block
> reservation at the same level it is acquired. This also skips the
> free clusters update, but the caller also calls into
> ext4_discard_allocated_blocks() to free the blocks back into the
> group. This survives an overnight loop test of generic/388 on an
> otherwise reproducing system and survives a local regression run.
>
> Signed-off-by: Brian Foster <bfoster@...hat.com>
> ---
>
> Hi all,
>
> I've thrown some testing at this and poked around the area enough that I
> _think_ it is reasonably sane, but the error paths are hairy and I could
> certainly be missing some details. I'm happy to try a different approach
> if there are any thoughts around that.. thanks.
>
> Brian
>
>  fs/ext4/mballoc.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
> index 56d50fd3310b..224abfd6a42b 100644
> --- a/fs/ext4/mballoc.c
> +++ b/fs/ext4/mballoc.c
> @@ -4234,7 +4234,7 @@ ext4_mb_mark_diskspace_used(struct ext4_allocation_context *ac,
>  				   ac->ac_b_ex.fe_start, ac->ac_b_ex.fe_len,
>  				   flags, &changed);
>  
> -	if (err && changed == 0)
> +	if (err)
>  		return err;
>  
>  #ifdef AGGRESSIVE_CHECK

I think we might need to swap that && for an ||.

Basically, we should only proceed with the following logic if there's
no error and the bitmap was actually changed. If we hit an error or
if the section we intended to modify was already fully handled,
we should just bail out early. Otherwise, the err could get quietly
ignored unless we hit a duplicate allocation that happens to result in
'changed' being zero.

By the way, I spotted two other spots with similar error logic:
ext4_mb_clear_bb() and ext4_group_add_blocks().

Since this issue popped up in the last couple of years, we should
probably add a Fixes: tag to make backporting easier.


Cheers,
Baokun