lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-Id: <20250217040928.34561-1-chunjie.zhu@cloud.com>
Date: Mon, 17 Feb 2025 04:09:27 +0000
From: Chunjie Zhu <chunjie.zhu@...ud.com>
To: Bob Peterson <rpeterso@...hat.com>,
	Andreas Gruenbacher <agruenba@...hat.com>
Cc: Chunjie Zhu <chunjie.zhu@...ud.com>,
	gfs2@...ts.linux.dev,
	linux-kernel@...r.kernel.org
Subject: [PATCH] fix gfs2 umount timeout bug

  If there are heavy lock contenions between nodes in a cluster, at
  fs umount time,

          node 1                           node 2
            |                                |
	    |                                |
     iopen glock lock    -->       iopen glock go_callback
            |                                |
	    |                                |
         EAGAIN                       try evict failure
	    |                                |
	    |                                |
       DLM_ECANCEL                           |
            |                                |
	    |                                |
      glock complete                         |
            |                                |
	    |                                |
    umount(clear_glock)                      |
            |                                |
	    |                                |
 cannot free iopen glock                     |
            |                                |
	    |                                |
    umount timeout (*)                       |
            |                                |
	    |                                |
      umount complete                        |
                                             |
                                             |
				       umount succeed

Signed-off-by: Chunjie Zhu <chunjie.zhu@...ud.com>
---
 fs/gfs2/glock.c | 20 +++++++++++++++++++-
 1 file changed, 19 insertions(+), 1 deletion(-)

diff --git a/fs/gfs2/glock.c b/fs/gfs2/glock.c
index 4a280be229a6..bf2445f0afa9 100644
--- a/fs/gfs2/glock.c
+++ b/fs/gfs2/glock.c
@@ -2120,6 +2120,23 @@ static void thaw_glock(struct gfs2_glock *gl)
 	gfs2_glock_queue_work(gl, 0);
 }
 
+/**
+ * IOPEN glock might be a zombie glock instance due to lock contention
+ * between nodes in the cluster during fs umount, then it causes umount
+ * timeout
+ */
+
+static int is_zombie_glock(struct gfs2_glock *gl)
+{
+	if (test_bit(GLF_LOCK, &gl->gl_flags) &&
+		test_bit(GLF_DEMOTE, &gl->gl_flags) &&
+		test_bit(GLF_BLOCKING, &gl->gl_flags) &&
+		(gl->gl_name.ln_type == LM_TYPE_IOPEN) &&
+		list_empty(&gl->gl_holders))
+		return 1;
+	return 0;
+}
+
 /**
  * clear_glock - look at a glock and see if we can free it from glock cache
  * @gl: the glock to look at
@@ -2132,7 +2149,8 @@ static void clear_glock(struct gfs2_glock *gl)
 
 	spin_lock(&gl->gl_lockref.lock);
 	if (!__lockref_is_dead(&gl->gl_lockref)) {
-		gl->gl_lockref.count++;
+		if (!is_zombie_glock(gl))
+			gl->gl_lockref.count++;
 		if (gl->gl_state != LM_ST_UNLOCKED)
 			handle_callback(gl, LM_ST_UNLOCKED, 0, false);
 		__gfs2_glock_queue_work(gl, 0);
-- 
2.34.1


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ