[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <8c7b1d37-758a-46b2-b8aa-afd333140bf4@oracle.com>
Date: Wed, 10 Apr 2024 14:24:13 -0500
From: Dave Kleikamp <dave.kleikamp@...cle.com>
To: Divyaank Tiwari <dtiwari@...stonybrook.edu>, shaggy@...nel.org,
jfs-discussion@...ts.sourceforge.net, linux-kernel@...r.kernel.org,
brauner@...nel.org, jack@...e.cz, axboe@...nel.dk,
jinpu.wang@...os.com, dchinner@...hat.com, lizhi.xu@...driver.com,
johannes.thumshirn@....com
Cc: Yifei Liu <yifeliu@...stonybrook.edu>, Erez Zadok
<ezk@...stonybrook.edu>,
Scott Smolka <sas@...stonybrook.edu>,
Geoff Kuenning <geoff@...hmc.edu>
Subject: Re: [PATCH] fs/jfs: fix order of kfree() and mutex_unlock() within
lmLogOpen() in jfs_logmgr.c
On 4/10/24 1:48PM, Divyaank Tiwari wrote:
> In the jfs_logmgr.c file, within lmLogOpen() under the “free” label,
> mutex_unlock(&jfs_log_mutex) is called before kfree(log). Through our
> model-checking tool Metis, we found that this is one of the potential
> causes for nondeterministic kernel-hang bugs in JFS. Specifically, this
> indirectly results in the “log” variable being NULL dereferenced in the
> function txEnd() in jfs_txnmgr.c.
>
> Fix: Swap the order of mutex_unlock(&jfs_log_mutex) and kfree(log).
>
> We checked the entire JFS codebase, especially the file jfs_logmgr.c where
> log is allocated and kfree’d multiple times, and found that every time,
> except this buggy case, a call to kfree() was followed by a mutex_unlock().
> This ensures that any shared log resources are protected by the
> jfs_log_mutex lock.
>
> The small patch given below fixes this bug and we are reasonably certain
> that it is the correct fix. Before this fix, we were able to trigger the
> kernel hang bug fairly quickly through Metis. Through multiple experiments,
> we found that we were able to cause the kernel to hang mostly within a few
> minutes of execution. While we are fairly certain that the patch below
> fixes *a* bug, we believe there’s another race/bug somewhere else that we
> have yet to identify. With this small fix, when we run Metis, we can still
> trigger a NULL ptr deref of “log” in function txEnd() in jfs_txnmgr.c,
> but now it takes anywhere from 6 to 137 hours (almost 6 days). We are
> hoping that this fix will also help some of the JFS maintainers identify
> other potential races.
I'm really not sure how this fixes anything. When the lock is released,
nothing should be referencing the data structure. Also, calling kfree()
doesn't have anything to do with the value of sbi->log, which would be
the cause of the NULL dereference. It may only affect the timing by
holding the lock a little longer.
Can you briefly explain the test case? It seems you are using an
external journal. (I don't think that's very common, which could help
explain why it hasn't been discovered.) Are more than one file system
sharing a journal? (I don't know if anyone actually does that, but this
was designed in from the beginning.)
That said, the patch looks completely safe. I'm not sure if it is
necessary since nothing should be referencing log any more and a
use-after-free is just as likely whether or not we hold the lock while
calling kfree().
Thanks,
Shaggy
>
> Note: All of our experiments were performed on Linux kernel v6.6.1.
>
> Signed-off-by: Divyaank Tiwari <dtiwari@...stonybrook.edu>
> Signed-off-by: Yifei Liu <yifeliu@...stonybrook.edu>
> Signed-off-by: Erez Zadok <ezk@...stonybrook.edu>
> Signed-off-by: Scott Smolka <sas@...stonybrook.edu>
> Signed-off-by: Geoff Kuenning <geoff@...hmc.edu>
> ---
> fs/jfs/jfs_logmgr.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/fs/jfs/jfs_logmgr.c b/fs/jfs/jfs_logmgr.c
> index 9609349e92e5..ce9af4ef1e41 100644
> --- a/fs/jfs/jfs_logmgr.c
> +++ b/fs/jfs/jfs_logmgr.c
> @@ -1144,8 +1144,8 @@ int lmLogOpen(struct super_block *sb)
> bdev_fput(bdev_file);
>
> free: /* free log descriptor */
> - mutex_unlock(&jfs_log_mutex);
> kfree(log);
> + mutex_unlock(&jfs_log_mutex);
>
> jfs_warn("lmLogOpen: exit(%d)", rc);
> return rc;
>
> base-commit: 2c71fdf02a95b3dd425b42f28fd47fb2b1d22702
Powered by blists - more mailing lists