linux-kernel - Re: [PATCH] vfs: handle __wait_on_freeing

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <ZpmMRzyE-mVrK74M@codewreck.org>
Date: Fri, 19 Jul 2024 06:42:31 +0900
From: Dominique Martinet <asmadeus@...ewreck.org>
To: Mateusz Guzik <mjguzik@...il.com>
Cc: brauner@...nel.org, viro@...iv.linux.org.uk, jack@...e.cz,
	linux-kernel@...r.kernel.org, linux-fsdevel@...r.kernel.org,
	Jakub Kicinski <kuba@...nel.org>, v9fs@...ts.linux.dev
Subject: Re: [PATCH] vfs: handle __wait_on_freeing_inode() and evict() race

Mateusz Guzik wrote on Thu, Jul 18, 2024 at 05:18:37PM +0200:
> Lockless hash lookup can find and lock the inode after it gets the
> I_FREEING flag set, at which point it blocks waiting for teardown in
> evict() to finish.
> 
> However, the flag is still set even after evict() wakes up all waiters.
> 
> This results in a race where if the inode lock is taken late enough, it
> can happen after both hash removal and wakeups, meaning there is nobody
> to wake the racing thread up.
> 
> This worked prior to RCU-based lookup because the entire ordeal was
> synchronized with the inode hash lock.
> 
> Since unhashing requires the inode lock, we can safely check whether it
> happened after acquiring it.
> 
> Link: https://lore.kernel.org/v9fs/20240717102458.649b60be@kernel.org/
> Reported-by: Dominique Martinet <asmadeus@...ewreck.org>
> Fixes: 7180f8d91fcb ("vfs: add rcu-based find_inode variants for iget ops")
> Signed-off-by: Mateusz Guzik <mjguzik@...il.com>
> ---
> 
> The 'fixes' tag is contingent on testing by someone else. :>

Thanks for the quick fix!

> I have 0 experience with 9pfs and the docs failed me vs getting it
> running on libvirt+qemu, so I gave up on trying to test it myself.

I hadn't used it until yesterday either, but virtme-ng[1] should be easy
enough to get running without much effort: just cloning this and running
/path/to/virtme-ng/vng from a built linux tree will start a vm with /
mounted as 9p read-only (--rwdir /foo for writing)
[1] https://github.com/arighi/virtme-ng 

> Dominique, you offered to narrow things down here, assuming the offer
> stands I would appreciate if you got this sorted out :)

Unfortunately I haven't been able to reproduce this :/
I'm not running the exact same workload but 9p should be instanciating
inodes from just a find in a large tree; I tried running finds in
parallel etc to no avail.

You mentioned adding some sleep to make this easier to hit, should
something like this help or did I get this wrong?
----
diff --git a/fs/inode.c b/fs/inode.c
index 54e0be80be14..c2991142a462 100644
--- a/fs/inode.c
+++ b/fs/inode.c
@@ -21,6 +21,7 @@
 #include <linux/list_lru.h>
 #include <linux/iversion.h>
 #include <linux/rw_hint.h>
+#include <linux/delay.h>
 #include <trace/events/writeback.h>
 #include "internal.h"
 
@@ -962,6 +963,7 @@ static struct inode *find_inode_fast(struct super_block *sb,
                        continue;
                if (inode->i_sb != sb)
                        continue;
+               usleep_range(10,100);
                spin_lock(&inode->i_lock);
                if (inode->i_state & (I_FREEING|I_WILL_FREE)) {
                        __wait_on_freeing_inode(inode, locked);
----
unfortunately I've checked with a printk there too and I never get there
in the first place, so it probably needs to hit another race first where
we're getting an inode that's about or has just been dropped or
something, but none of my "9p stress" workloads seem to be hitting it
either...
Could be some scheduling difference or just that my workloads aren't
appropriate; I need to try running networking tests but ran out of time
for today.

> Even if the patch in the current form does not go in, it should be
> sufficient to confirm the problem diagnosis is correct.
> 
> A debug printk can be added to validate the problematic condition was
> encountered, for example:

That was helpful, thanks.

> > diff --git a/fs/inode.c b/fs/inode.c
> > index 54e0be80be14..8f61fad0bc69 100644
> > --- a/fs/inode.c
> > +++ b/fs/inode.c
> > @@ -2308,6 +2308,7 @@ static void __wait_on_freeing_inode(struct inode *inode, bool locked)
> >         if (unlikely(inode_unhashed(inode))) {
> >                 BUG_ON(locked);
> >                 spin_unlock(&inode->i_lock);
> > +               printk(KERN_EMERG "%s: got unhashed inode %p\n", __func__, inode);
> >                 return;
> >         }

-- 
Dominique Martinet | Asmadeus