lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-Id: <20240410184857.42754-1-dtiwari@cs.stonybrook.edu>
Date: Wed, 10 Apr 2024 14:48:55 -0400
From: Divyaank Tiwari <dtiwari@...stonybrook.edu>
To: shaggy@...nel.org,
	dave.kleikamp@...cle.com,
	jfs-discussion@...ts.sourceforge.net,
	linux-kernel@...r.kernel.org,
	brauner@...nel.org,
	jack@...e.cz,
	axboe@...nel.dk,
	jinpu.wang@...os.com,
	dchinner@...hat.com,
	lizhi.xu@...driver.com,
	johannes.thumshirn@....com
Cc: Divyaank Tiwari <dtiwari@...stonybrook.edu>,
	Yifei Liu <yifeliu@...stonybrook.edu>,
	Erez Zadok <ezk@...stonybrook.edu>,
	Scott Smolka <sas@...stonybrook.edu>,
	Geoff Kuenning <geoff@...hmc.edu>
Subject: [PATCH] fs/jfs: fix order of kfree() and mutex_unlock() within lmLogOpen() in jfs_logmgr.c

In the jfs_logmgr.c file, within lmLogOpen() under the “free” label,
mutex_unlock(&jfs_log_mutex) is called before kfree(log). Through our
model-checking tool Metis, we found that this is one of the potential
causes for nondeterministic kernel-hang bugs in JFS. Specifically, this
indirectly results in the “log” variable being NULL dereferenced in the
function txEnd() in jfs_txnmgr.c.

Fix: Swap the order of mutex_unlock(&jfs_log_mutex) and kfree(log).

We checked the entire JFS codebase, especially the file jfs_logmgr.c where
log is allocated and kfree’d multiple times, and found that every time,
except this buggy case, a call to kfree() was followed by a mutex_unlock().
This ensures that any shared log resources are protected by the
jfs_log_mutex lock.

The small patch given below fixes this bug and we are reasonably certain
that it is the correct fix.  Before this fix, we were able to trigger the
kernel hang bug fairly quickly through Metis. Through multiple experiments,
we found that we were able to cause the kernel to hang mostly within a few
minutes of execution. While we are fairly certain that the patch below
fixes *a* bug, we believe there’s another race/bug somewhere else that we
have yet to identify.  With this small fix, when we run Metis, we can still
trigger a NULL ptr deref of “log” in function txEnd() in jfs_txnmgr.c,
but now it takes anywhere from 6 to 137 hours (almost 6 days). We are
hoping that this fix will also help some of the JFS maintainers identify
other potential races.

Note: All of our experiments were performed on Linux kernel v6.6.1.

Signed-off-by: Divyaank Tiwari <dtiwari@...stonybrook.edu>
Signed-off-by: Yifei Liu <yifeliu@...stonybrook.edu>
Signed-off-by: Erez Zadok <ezk@...stonybrook.edu>
Signed-off-by: Scott Smolka <sas@...stonybrook.edu>
Signed-off-by: Geoff Kuenning <geoff@...hmc.edu>
---
 fs/jfs/jfs_logmgr.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/jfs/jfs_logmgr.c b/fs/jfs/jfs_logmgr.c
index 9609349e92e5..ce9af4ef1e41 100644
--- a/fs/jfs/jfs_logmgr.c
+++ b/fs/jfs/jfs_logmgr.c
@@ -1144,8 +1144,8 @@ int lmLogOpen(struct super_block *sb)
 	bdev_fput(bdev_file);
 
       free:		/* free log descriptor */
-	mutex_unlock(&jfs_log_mutex);
 	kfree(log);
+	mutex_unlock(&jfs_log_mutex);
 
 	jfs_warn("lmLogOpen: exit(%d)", rc);
 	return rc;

base-commit: 2c71fdf02a95b3dd425b42f28fd47fb2b1d22702
-- 
2.34.1


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ