[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-id: <173751770796.22054.11065694028641211869@noble.neil.brown.name>
Date: Wed, 22 Jan 2025 14:48:27 +1100
From: "NeilBrown" <neilb@...e.de>
To: "Jeff Layton" <jlayton@...nel.org>
Cc: "Chuck Lever" <chuck.lever@...cle.com>,
"Li Lingfeng" <lilingfeng3@...wei.com>, okorniev@...hat.com,
Dai.Ngo@...cle.com, tom@...pey.com, linux-nfs@...r.kernel.org,
linux-kernel@...r.kernel.org, yukuai1@...weicloud.com, houtao1@...wei.com,
yi.zhang@...wei.com, yangerkun@...wei.com, lilingfeng@...weicloud.com
Subject: Re: [PATCH] nfsd: free nfsd_file by gc after adding it to lru list
On Wed, 22 Jan 2025, NeilBrown wrote:
> On Wed, 22 Jan 2025, Jeff Layton wrote:
> > To be clear, I think we need to drop e57420be100ab from your nfsd-
> > testing branch. The race I identified above is quite likely to occur
> > and could lead to leaks.
> >
> > If Li Lingfeng doesn't propose a patch, I'll spin one up tomorrow. I
> > think the RCU approach is safe.
>
> I'm not convinced this is the right approach.
> I cannot see how nfsd_file_put() can race with unhashing. If it cannot
> then we can simply unconditionally call nfsd_file_schedule_laundrette().
>
> Can describe how the race can happen - if indeed it can.
I thought I should explore this more and explain what I think actually
happens ...
Certainly nfsd_file_unhash() might race with nfsd_file_put(). At this
point in nfsd_file_put() we have the only reference but a hash lookup
could gain another reference and the immediately unhash it.
nfsd_file_queue_for_close() can do this. There might be other paths.
But why does this mean we need to remove it from the lru and free it
immediately? If we leave it on the lru it will be freed in a couple of
seconds.
The reason might be nfsd_file_close_inode_sync(). This needs to close
files before returning.
But if nfsd_file_close_inode_sync() is called while some other thread
holds a reference to the file and might want to call nfsd_file_put(),
then it isn't going to succeed anyway so any race here doesn't make any
difference.
So I think the following might be the best fix
???
Thanks,
NeilBrown
diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c
index fcd751cb7c76..773788a50e56 100644
--- a/fs/nfsd/filecache.c
+++ b/fs/nfsd/filecache.c
@@ -322,10 +322,13 @@ nfsd_file_check_writeback(struct nfsd_file *nf)
static bool nfsd_file_lru_add(struct nfsd_file *nf)
{
set_bit(NFSD_FILE_REFERENCED, &nf->nf_flags);
+ rcu_read_lock();
if (list_lru_add_obj(&nfsd_file_lru, &nf->nf_lru)) {
trace_nfsd_file_lru_add(nf);
+ rcu_read_unlock();
return true;
}
+ rcu_read_unlock();
return false;
}
@@ -371,19 +374,8 @@ nfsd_file_put(struct nfsd_file *nf)
/* Try to add it to the LRU. If that fails, decrement. */
if (nfsd_file_lru_add(nf)) {
- /* If it's still hashed, we're done */
- if (test_bit(NFSD_FILE_HASHED, &nf->nf_flags)) {
- nfsd_file_schedule_laundrette();
- return;
- }
-
- /*
- * We're racing with unhashing, so try to remove it from
- * the LRU. If removal fails, then someone else already
- * has our reference.
- */
- if (!nfsd_file_lru_remove(nf))
- return;
+ nfsd_file_schedule_laundrette();
+ return;
}
}
if (refcount_dec_and_test(&nf->nf_ref))
Powered by blists - more mailing lists