[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAHpGcMK0vez+c0NT-U40=JYy+_X-rtfR50KEZXiFXuA6E-tUmQ@mail.gmail.com>
Date: Tue, 18 Feb 2025 07:57:25 +0100
From: Andreas Grünbacher <andreas.gruenbacher@...il.com>
To: Chunjie Zhu <chunjie.zhu@...ud.com>
Cc: Andreas Gruenbacher <agruenba@...hat.com>, gfs2@...ts.linux.dev, linux-kernel@...r.kernel.org
Subject: Re: [PATCH] fix gfs2 umount timeout bug
Hello,
Am Mo., 17. Feb. 2025 um 05:10 Uhr schrieb Chunjie Zhu <chunjie.zhu@...ud.com>:
> If there are heavy lock contenions between nodes in a cluster, at
> fs umount time,
>
> node 1 node 2
> | |
> | |
> iopen glock lock --> iopen glock go_callback
> | |
> | |
> EAGAIN try evict failure
> | |
> | |
> DLM_ECANCEL |
> | |
> | |
> glock complete |
> | |
> | |
> umount(clear_glock) |
> | |
> | |
> cannot free iopen glock |
> | |
> | |
> umount timeout (*) |
> | |
> | |
> umount complete |
> |
> |
> umount succeed
Thank you for your bug report. I'm having a hard time following what
you are trying to say, and the patch itself doesn't look right to me.
If there was a reference counting problem like the patch suggests, we
would probably see regular left-over glocks at unmount time, but I'm
not aware of any such problems. So you ld you please explain in a bit
more detail what you think the problem is? Do you get any messages in
the syslog? The file checksum in the patch refers to commit
bb25b97562e5 ("gfs2: remove dead code in add_to_queue") from 2023.
What exact kernel version are you running?
Thanks,
Andreas
> Signed-off-by: Chunjie Zhu <chunjie.zhu@...ud.com>
> ---
> fs/gfs2/glock.c | 20 +++++++++++++++++++-
> 1 file changed, 19 insertions(+), 1 deletion(-)
>
> diff --git a/fs/gfs2/glock.c b/fs/gfs2/glock.c
> index 4a280be229a6..bf2445f0afa9 100644
> --- a/fs/gfs2/glock.c
> +++ b/fs/gfs2/glock.c
> @@ -2120,6 +2120,23 @@ static void thaw_glock(struct gfs2_glock *gl)
> gfs2_glock_queue_work(gl, 0);
> }
>
> +/**
> + * IOPEN glock might be a zombie glock instance due to lock contention
> + * between nodes in the cluster during fs umount, then it causes umount
> + * timeout
> + */
> +
> +static int is_zombie_glock(struct gfs2_glock *gl)
> +{
> + if (test_bit(GLF_LOCK, &gl->gl_flags) &&
> + test_bit(GLF_DEMOTE, &gl->gl_flags) &&
> + test_bit(GLF_BLOCKING, &gl->gl_flags) &&
> + (gl->gl_name.ln_type == LM_TYPE_IOPEN) &&
> + list_empty(&gl->gl_holders))
> + return 1;
> + return 0;
> +}
> +
> /**
> * clear_glock - look at a glock and see if we can free it from glock cache
> * @gl: the glock to look at
> @@ -2132,7 +2149,8 @@ static void clear_glock(struct gfs2_glock *gl)
>
> spin_lock(&gl->gl_lockref.lock);
> if (!__lockref_is_dead(&gl->gl_lockref)) {
> - gl->gl_lockref.count++;
> + if (!is_zombie_glock(gl))
> + gl->gl_lockref.count++;
> if (gl->gl_state != LM_ST_UNLOCKED)
> handle_callback(gl, LM_ST_UNLOCKED, 0, false);
> __gfs2_glock_queue_work(gl, 0);
> --
> 2.34.1
>
>
Powered by blists - more mailing lists