[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAHc6FU5E2UNyhLuwJVdv3xnHEcM_Si2oX06S+5mz_3uz3-QqnA@mail.gmail.com>
Date: Sat, 8 Nov 2025 21:00:36 +0100
From: Andreas Gruenbacher <agruenba@...hat.com>
To: Edward Adam Davis <eadavis@...com>
Cc: syzbot+63ba84f14f62e61a5fd0@...kaller.appspotmail.com,
gfs2@...ts.linux.dev, linux-kernel@...r.kernel.org,
syzkaller-bugs@...glegroups.com
Subject: Re: [PATCH] gfs2: Fix memory leak in gfs2_trans_begin
Hello,
On Sat, Nov 8, 2025 at 10:13 AM Edward Adam Davis <eadavis@...com> wrote:
> According to log [1], a "bad magic number" was found when checking the
> metatype, which caused gfs2 withdraw.
>
> The root cause of the problem is: log flush treats non-delayed withdraw
> as withdraw, resulting in no one reclaiming the memory of transaction.
> See the call stack below for details.
>
> CPU1 CPU2
> ==== ====
> gfs2_meta_buffer()
> gfs2_metatype_check()
> gfs2_metatype_check_i()
> gfs2_metatype_check_ii() gfs2_log_flush()
> gfs2_withdraw() tr = sdp->sd_log_tr
> signal_our_withdraw() sdp->sd_log_tr = NULL
> gfs2_ail_drain() goto out_withdraw
> spin_unlock(&sdp->sd_ail_lock) trans_drain()
> spin_lock(&sdp->sd_ail_lock)
> list_add(&tr->tr_list, &sdp->sd_ail1_list)
> tr = NULL
> goto out_end
>
this bug report is against upstream commit c2c2ccfd4ba7, which
precedes the withdraw rework on gfs2's for-next branch. With those
patches, the race you are describing is no longer possible because
do_withdraw() now uses sdp->sd_log_flush_lock and the SDF_JOURNAL_LIVE
flag to synchronize with gfs2_log_flush().
I don't know why Bob chose to push the transaction onto the ail1 list
instead of freeing it in gfs2_log_flush(); that's something to clean
up. I've pushed an untested patch doing that to for-later.
Related commits:
58e08e8d83ab ("gfs2: fix trans slab error when withdraw occurs inside
log_flush")
f5456b5d67cf ("gfs2: Clean up revokes on normal withdraws")
Thanks,
Andreas
> The original text suggests adding a delayed withdraw check to handle
> transaction cases to avoid similar memory leaks.
>
> syzbot reported:
> [1]
> gfs2: fsid=syz:syz.0: fatal: invalid metadata block - bh = 9381 (bad magic number), function = gfs2_meta_buffer, file = fs/gfs2/meta_io.c, line = 499
>
> [2]
> BUG: memory leak
> unreferenced object 0xffff888126cf1000 (size 144):
> backtrace (crc f56b339f):
> gfs2_trans_begin+0x29/0xa0 fs/gfs2/trans.c:115
> alloc_dinode fs/gfs2/inode.c:418 [inline]
> gfs2_create_inode+0xca0/0x1890 fs/gfs2/inode.c:807
>
>
> Fixes: f5456b5d67cf ("gfs2: Clean up revokes on normal withdraws")
> Reported-by: syzbot+63ba84f14f62e61a5fd0@...kaller.appspotmail.com
> Closes: https://syzkaller.appspot.com/bug?extid=63ba84f14f62e61a5fd0
> Tested-by: syzbot+63ba84f14f62e61a5fd0@...kaller.appspotmail.com
> Signed-off-by: Edward Adam Davis <eadavis@...com>
> ---
> fs/gfs2/log.c | 12 +++++++-----
> 1 file changed, 7 insertions(+), 5 deletions(-)
>
> diff --git a/fs/gfs2/log.c b/fs/gfs2/log.c
> index 115c4ac457e9..7bba7951dbdb 100644
> --- a/fs/gfs2/log.c
> +++ b/fs/gfs2/log.c
> @@ -1169,11 +1169,13 @@ void gfs2_log_flush(struct gfs2_sbd *sdp, struct gfs2_glock *gl, u32 flags)
> * never queued onto any of the ail lists. Here we add it to
> * ail1 just so that ail_drain() will find and free it.
> */
> - spin_lock(&sdp->sd_ail_lock);
> - if (tr && list_empty(&tr->tr_list))
> - list_add(&tr->tr_list, &sdp->sd_ail1_list);
> - spin_unlock(&sdp->sd_ail_lock);
> - tr = NULL;
> + if (gfs2_withdrawing(sdp)) {
> + spin_lock(&sdp->sd_ail_lock);
> + if (tr && list_empty(&tr->tr_list))
> + list_add(&tr->tr_list, &sdp->sd_ail1_list);
> + spin_unlock(&sdp->sd_ail_lock);
> + tr = NULL;
> + }
> goto out_end;
> }
>
> --
> 2.43.0
>
Powered by blists - more mailing lists