lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-Id: <20250221072114.69344-1-chunjie.zhu@cloud.com>
Date: Fri, 21 Feb 2025 07:21:13 +0000
From: Chunjie Zhu <chunjie.zhu@...ud.com>
To: chunjie.zhu@...ud.com
Cc: agruenba@...hat.com,
	andreas.gruenbacher@...il.com,
	gfs2@...ts.linux.dev,
	linux-kernel@...r.kernel.org
Subject: Re: Re: Re: [PATCH] fix gfs2 umount timeout bug

The time ordered events,

IO app -> do_xmote -> gdlm_lock -> gfs2_glock_complete (ret is CANCEL) ->
__gfs2_glock_queue_work

kworker A -> glock_work_func -> finish_xmote -> gfs2_holder_wake ->
-> retry do_xmote (set GLF_LOCK flag) -> gdlm_lock (DLM does not invoke
GFS2 callbacks) -> run_queue (do nothing as glock has GLF_LOCK flag)

glock refcount is 1

umount -> clear_glock (refcount +1) -> glock_work_func -> run_queue (do
nothing as glock has GLF_LOCK flag) -> refcount -1

glock refcount is 1, still in memory

> 
> INFO: task umount:75342 blocked for more than 483 seconds.
>       Not tainted 6.6.22+0 #1
> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> task:umount          state:D stack:0     pid:75342 ppid:75335  flags:0x00004002
> Call Trace:
>  <TASK>
>  __schedule+0x3a0/0x1330
>  ? srso_alias_return_thunk+0x5/0x7f
>  ? srso_alias_return_thunk+0x5/0x7f
>  schedule+0x53/0xc0
>  schedule_timeout+0x76/0xf0
>  ? __pfx_process_timeout+0x10/0x10
>  gfs2_gl_hash_clear+0x135/0x140 [gfs2]
>  ? __pfx_autoremove_wake_function+0x10/0x10
>  gfs2_put_super+0x175/0x220 [gfs2]
>  generic_shutdown_super+0x7e/0x170
>  kill_block_super+0x16/0x40
>  deactivate_locked_super+0x2f/0xa0
>  cleanup_mnt+0xbd/0x150
>  task_work_run+0x60/0xa0
>  exit_to_user_mode_prepare+0x117/0x120
>  syscall_exit_to_user_mode+0x22/0x40
>  ? srso_alias_return_thunk+0x5/0x7f
>  do_syscall_64+0x67/0x80
>  ? srso_alias_return_thunk+0x5/0x7f
>  ? syscall_exit_to_user_mode+0x27/0x40
>  ? srso_alias_return_thunk+0x5/0x7f
>  ? do_syscall_64+0x67/0x80
>  ? srso_alias_return_thunk+0x5/0x7f
>  ? syscall_exit_to_user_mode+0x27/0x40
>  ? srso_alias_return_thunk+0x5/0x7f
>  ? do_syscall_64+0x67/0x80
>  ? syscall_exit_to_user_mode+0x27/0x40
>  ? srso_alias_return_thunk+0x5/0x7f
>  ? do_syscall_64+0x67/0x80
>  ? do_syscall_64+0x67/0x80
>  ? do_syscall_64+0x67/0x80
>  ? exc_page_fault+0x72/0x130
>  entry_SYSCALL_64_after_hwframe+0x6e/0xd8
> RIP: 0033:0x7fb0823ebeab
> RSP: 002b:00007ffcd0d45b68 EFLAGS: 00000246 ORIG_RAX: 00000000000000a6
> RAX: 0000000000000000 RBX: 00007fb081257000 RCX: 00007fb0823ebeab
> RDX: 0000000000000000 RSI: 0000000000000000 RDI: 00007fb081219980
> RBP: 00007fb081257118 R08: 0000000000000073 R09: 0000000000000001
> R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
> R13: 00007fb081219980 R14: 0000000000000000 R15: 00007fb081257000
>  </TASK>
> 
> > 
> > > Signed-off-by: Chunjie Zhu <chunjie.zhu@...ud.com>
> > > ---
> > >  fs/gfs2/glock.c | 20 +++++++++++++++++++-
> > >  1 file changed, 19 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/fs/gfs2/glock.c b/fs/gfs2/glock.c
> > > index 4a280be229a6..bf2445f0afa9 100644
> > > --- a/fs/gfs2/glock.c
> > > +++ b/fs/gfs2/glock.c
> > > @@ -2120,6 +2120,23 @@ static void thaw_glock(struct gfs2_glock *gl)
> > >         gfs2_glock_queue_work(gl, 0);
> > >  }
> > >
> > > +/**
> > > + * IOPEN glock might be a zombie glock instance due to lock contention
> > > + * between nodes in the cluster during fs umount, then it causes umount
> > > + * timeout
> > > + */
> > > +
> > > +static int is_zombie_glock(struct gfs2_glock *gl)
> > > +{
> > > +       if (test_bit(GLF_LOCK, &gl->gl_flags) &&
> > > +               test_bit(GLF_DEMOTE, &gl->gl_flags) &&
> > > +               test_bit(GLF_BLOCKING, &gl->gl_flags) &&
> > > +               (gl->gl_name.ln_type == LM_TYPE_IOPEN) &&
> > > +               list_empty(&gl->gl_holders))
> > > +               return 1;
> > > +       return 0;
> > > +}
> > > +
> > >  /**
> > >   * clear_glock - look at a glock and see if we can free it from glock cache
> > >   * @gl: the glock to look at
> > > @@ -2132,7 +2149,8 @@ static void clear_glock(struct gfs2_glock *gl)
> > >
> > >         spin_lock(&gl->gl_lockref.lock);
> > >         if (!__lockref_is_dead(&gl->gl_lockref)) {
> > > -               gl->gl_lockref.count++;
> > > +               if (!is_zombie_glock(gl))
> > > +                       gl->gl_lockref.count++;
> > >                 if (gl->gl_state != LM_ST_UNLOCKED)
> > >                         handle_callback(gl, LM_ST_UNLOCKED, 0, false);
> > >                 __gfs2_glock_queue_work(gl, 0);
> > > --
> > > 2.34.1
> > >
> > >
> > 
> 

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ