linux-kernel - Re: [PATCH] fix gfs2 umount timeout bug

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAHpGcMK0vez+c0NT-U40=JYy+_X-rtfR50KEZXiFXuA6E-tUmQ@mail.gmail.com>
Date: Tue, 18 Feb 2025 07:57:25 +0100
From: Andreas Grünbacher <andreas.gruenbacher@...il.com>
To: Chunjie Zhu <chunjie.zhu@...ud.com>
Cc: Andreas Gruenbacher <agruenba@...hat.com>, gfs2@...ts.linux.dev, linux-kernel@...r.kernel.org
Subject: Re: [PATCH] fix gfs2 umount timeout bug

Hello,

Am Mo., 17. Feb. 2025 um 05:10 Uhr schrieb Chunjie Zhu <chunjie.zhu@...ud.com>:
>   If there are heavy lock contenions between nodes in a cluster, at
>   fs umount time,
>
>           node 1                           node 2
>             |                                |
>             |                                |
>      iopen glock lock    -->       iopen glock go_callback
>             |                                |
>             |                                |
>          EAGAIN                       try evict failure
>             |                                |
>             |                                |
>        DLM_ECANCEL                           |
>             |                                |
>             |                                |
>       glock complete                         |
>             |                                |
>             |                                |
>     umount(clear_glock)                      |
>             |                                |
>             |                                |
>  cannot free iopen glock                     |
>             |                                |
>             |                                |
>     umount timeout (*)                       |
>             |                                |
>             |                                |
>       umount complete                        |
>                                              |
>                                              |
>                                        umount succeed

Thank you for your bug report. I'm having a hard time following what
you are trying to say, and the patch itself doesn't look right to me.
If there was a reference counting problem like the patch suggests, we
would probably see regular left-over glocks at unmount time, but I'm
not aware of any such problems. So you ld you please explain in a bit
more detail what you think the problem is? Do you get any messages in
the syslog? The file checksum in the patch refers to commit
bb25b97562e5 ("gfs2: remove dead code in add_to_queue") from 2023.
What exact kernel version are you running?

Thanks,
Andreas

> Signed-off-by: Chunjie Zhu <chunjie.zhu@...ud.com>
> ---
>  fs/gfs2/glock.c | 20 +++++++++++++++++++-
>  1 file changed, 19 insertions(+), 1 deletion(-)
>
> diff --git a/fs/gfs2/glock.c b/fs/gfs2/glock.c
> index 4a280be229a6..bf2445f0afa9 100644
> --- a/fs/gfs2/glock.c
> +++ b/fs/gfs2/glock.c
> @@ -2120,6 +2120,23 @@ static void thaw_glock(struct gfs2_glock *gl)
>         gfs2_glock_queue_work(gl, 0);
>  }
>
> +/**
> + * IOPEN glock might be a zombie glock instance due to lock contention
> + * between nodes in the cluster during fs umount, then it causes umount
> + * timeout
> + */
> +
> +static int is_zombie_glock(struct gfs2_glock *gl)
> +{
> +       if (test_bit(GLF_LOCK, &gl->gl_flags) &&
> +               test_bit(GLF_DEMOTE, &gl->gl_flags) &&
> +               test_bit(GLF_BLOCKING, &gl->gl_flags) &&
> +               (gl->gl_name.ln_type == LM_TYPE_IOPEN) &&
> +               list_empty(&gl->gl_holders))
> +               return 1;
> +       return 0;
> +}
> +
>  /**
>   * clear_glock - look at a glock and see if we can free it from glock cache
>   * @gl: the glock to look at
> @@ -2132,7 +2149,8 @@ static void clear_glock(struct gfs2_glock *gl)
>
>         spin_lock(&gl->gl_lockref.lock);
>         if (!__lockref_is_dead(&gl->gl_lockref)) {
> -               gl->gl_lockref.count++;
> +               if (!is_zombie_glock(gl))
> +                       gl->gl_lockref.count++;
>                 if (gl->gl_state != LM_ST_UNLOCKED)
>                         handle_callback(gl, LM_ST_UNLOCKED, 0, false);
>                 __gfs2_glock_queue_work(gl, 0);
> --
> 2.34.1
>
>