lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-id: <173751770796.22054.11065694028641211869@noble.neil.brown.name>
Date: Wed, 22 Jan 2025 14:48:27 +1100
From: "NeilBrown" <neilb@...e.de>
To: "Jeff Layton" <jlayton@...nel.org>
Cc: "Chuck Lever" <chuck.lever@...cle.com>,
 "Li Lingfeng" <lilingfeng3@...wei.com>, okorniev@...hat.com,
 Dai.Ngo@...cle.com, tom@...pey.com, linux-nfs@...r.kernel.org,
 linux-kernel@...r.kernel.org, yukuai1@...weicloud.com, houtao1@...wei.com,
 yi.zhang@...wei.com, yangerkun@...wei.com, lilingfeng@...weicloud.com
Subject: Re: [PATCH] nfsd: free nfsd_file by gc after adding it to lru list

On Wed, 22 Jan 2025, NeilBrown wrote:
> On Wed, 22 Jan 2025, Jeff Layton wrote:
> > To be clear, I think we need to drop e57420be100ab from your nfsd-
> > testing branch. The race I identified above is quite likely to occur
> > and could lead to leaks.
> > 
> > If Li Lingfeng doesn't propose a patch, I'll spin one up tomorrow. I
> > think the RCU approach is safe.
> 
> I'm not convinced this is the right approach.
> I cannot see how nfsd_file_put() can race with unhashing.  If it cannot
> then we can simply unconditionally call nfsd_file_schedule_laundrette().
> 
> Can describe how the race can happen - if indeed it can.

I thought I should explore this more and explain what I think actually
happens ...

Certainly nfsd_file_unhash() might race with nfsd_file_put().  At this
point in nfsd_file_put() we have the only reference but a hash lookup
could gain another reference and the immediately unhash it.
nfsd_file_queue_for_close() can do this.  There might be other paths.

But why does this mean we need to remove it from the lru and free it
immediately?  If we leave it on the lru it will be freed in a couple of
seconds.

The reason might be nfsd_file_close_inode_sync().  This needs to close
files before returning.
But if nfsd_file_close_inode_sync() is called while some other thread
holds a reference to the file and might want to call nfsd_file_put(),
then it isn't going to succeed anyway so any race here doesn't make any
difference.

So I think the following might be the best fix

???

Thanks,
NeilBrown

diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c
index fcd751cb7c76..773788a50e56 100644
--- a/fs/nfsd/filecache.c
+++ b/fs/nfsd/filecache.c
@@ -322,10 +322,13 @@ nfsd_file_check_writeback(struct nfsd_file *nf)
 static bool nfsd_file_lru_add(struct nfsd_file *nf)
 {
 	set_bit(NFSD_FILE_REFERENCED, &nf->nf_flags);
+	rcu_read_lock();
 	if (list_lru_add_obj(&nfsd_file_lru, &nf->nf_lru)) {
 		trace_nfsd_file_lru_add(nf);
+		rcu_read_unlock();
 		return true;
 	}
+	rcu_read_unlock();
 	return false;
 }
 
@@ -371,19 +374,8 @@ nfsd_file_put(struct nfsd_file *nf)
 
 		/* Try to add it to the LRU.  If that fails, decrement. */
 		if (nfsd_file_lru_add(nf)) {
-			/* If it's still hashed, we're done */
-			if (test_bit(NFSD_FILE_HASHED, &nf->nf_flags)) {
-				nfsd_file_schedule_laundrette();
-				return;
-			}
-
-			/*
-			 * We're racing with unhashing, so try to remove it from
-			 * the LRU. If removal fails, then someone else already
-			 * has our reference.
-			 */
-			if (!nfsd_file_lru_remove(nf))
-				return;
+			nfsd_file_schedule_laundrette();
+			return;
 		}
 	}
 	if (refcount_dec_and_test(&nf->nf_ref))


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ