[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <fcfbf7c0-f65f-4bf8-a702-4dafc9c094c9@huaweicloud.com>
Date: Tue, 6 Jan 2026 12:11:10 +0800
From: Wang Zhaolong <wangzhaolong@...weicloud.com>
To: Trond Myklebust <trondmy@...nel.org>, anna@...nel.org, kolga@...app.com
Cc: linux-nfs@...r.kernel.org, linux-kernel@...r.kernel.org,
yi.zhang@...wei.com, yangerkun@...wei.com, chengzhihao1@...wei.com,
lilingfeng3@...wei.com, zhangjian496@...wei.com
Subject: Re: [PATCH] [RFC] NFSv4.1: slot table draining + memory reclaim can
deadlock state manager creation
Hi Trond,
Thanks a lot for taking the time to look into this and for the patch proposal.
>
> I think we need to treat nfs_release_folio() as being different from
> nfs_wb_folio() altogether. There are too many different ways in which
> waiting on I/O can cause deadlocks, including waiting on the writeback
> to complete. My suggestion is therefore that we just make this simple,
> and see it as a hint that we should kick off either writeback or a
> commit, but not that we should be waiting for it to complete.
> So how about the following?
>
I agree with the overall direction.Avoiding any synchronous waits from
->release_folio() makes a lot of sense
> 8<------------------------------------------------------------------
> From 3533434037066b610d50e7bd36f3525ace296928 Mon Sep 17 00:00:00 2001
> Message-ID: <3533434037066b610d50e7bd36f3525ace296928.1767204181.git.trond.myklebust@...merspace.com>
> From: Trond Myklebust <trond.myklebust@...merspace.com>
> Date: Wed, 31 Dec 2025 11:42:31 -0500
> Subject: [PATCH] NFS: Fix a deadlock involving nfs_release_folio()
>
> Wang Zhaolong reports a deadlock involving NFSv4.1 state recovery
> waiting on kthreadd, which is attempting to reclaim memory by calling
> nfs_release_folio(). The latter cannot make progress due to state
> recovery being needed.
>
> It seems that the only safe thing to do here is to kick off a writeback
> of the folio, without waiting for completion, or else kicking off an
> asynchronous commit.
>
> Reported-by: Wang Zhaolong <wangzhaolong@...weicloud.com>
> Fixes: 96780ca55e3c ("NFS: fix up nfs_release_folio() to try to release the page")
> Signed-off-by: Trond Myklebust <trond.myklebust@...merspace.com>
> ---
> fs/nfs/file.c | 3 ++-
> fs/nfs/nfstrace.h | 3 +++
> fs/nfs/write.c | 32 ++++++++++++++++++++++++++++++++
> include/linux/nfs_fs.h | 1 +
> 4 files changed, 38 insertions(+), 1 deletion(-)
>
> diff --git a/fs/nfs/file.c b/fs/nfs/file.c
> index d020aab40c64..d1c138a416cf 100644
> --- a/fs/nfs/file.c
> +++ b/fs/nfs/file.c
> @@ -511,7 +511,8 @@ static bool nfs_release_folio(struct folio *folio, gfp_t gfp)
> if ((current_gfp_context(gfp) & GFP_KERNEL) != GFP_KERNEL ||
> current_is_kswapd() || current_is_kcompactd())
> return false;
> - if (nfs_wb_folio(folio->mapping->host, folio) < 0)
> + if (nfs_wb_folio_reclaim(folio->mapping->host, folio) < 0 ||
> + folio_test_private(folio))
> return false;
> }
> return nfs_fscache_release_folio(folio, gfp);
> diff --git a/fs/nfs/nfstrace.h b/fs/nfs/nfstrace.h
> index 6ce55e8e6b67..9f9ce4a565ea 100644
> --- a/fs/nfs/nfstrace.h
> +++ b/fs/nfs/nfstrace.h
> @@ -1062,6 +1062,9 @@ DECLARE_EVENT_CLASS(nfs_folio_event_done,
> DEFINE_NFS_FOLIO_EVENT(nfs_aop_readpage);
> DEFINE_NFS_FOLIO_EVENT_DONE(nfs_aop_readpage_done);
>
> +DEFINE_NFS_FOLIO_EVENT(nfs_writeback_folio_reclaim);
> +DEFINE_NFS_FOLIO_EVENT_DONE(nfs_writeback_folio_reclaim_done);
> +
> DEFINE_NFS_FOLIO_EVENT(nfs_writeback_folio);
> DEFINE_NFS_FOLIO_EVENT_DONE(nfs_writeback_folio_done);
>
> diff --git a/fs/nfs/write.c b/fs/nfs/write.c
> index 336c510f3750..5de9ec6c72a2 100644
> --- a/fs/nfs/write.c
> +++ b/fs/nfs/write.c
> @@ -2024,6 +2024,38 @@ int nfs_wb_folio_cancel(struct inode *inode, struct folio *folio)
> return ret;
> }
>
> +/**
> + * nfs_wb_folio_reclaim - Write back all requests on one page
> + * @inode: pointer to page
> + * @folio: pointer to folio
> + *
> + * Assumes that the folio has been locked by the caller
> + */
> +int nfs_wb_folio_reclaim(struct inode *inode, struct folio *folio)
> +{
> + loff_t range_start = folio_pos(folio);
> + size_t len = folio_size(folio);
> + struct writeback_control wbc = {
> + .sync_mode = WB_SYNC_ALL,
> + .nr_to_write = 0,
> + .range_start = range_start,
> + .range_end = range_start + len - 1,
> + .for_sync = 1,
> + };
> + int ret;
One small thing: in nfs_wb_folio_reclaim(), the else
nfs_commit_inode(inode, 0); branch seems to return an
uninitialized ret (unless I missed something).
> +
> + if (folio_test_writeback(folio))
> + return -EBUSY;
> + if (folio_clear_dirty_for_io(folio)) {
> + trace_nfs_writeback_folio_reclaim(inode, range_start, len);
> + ret = nfs_writepage_locked(folio, &wbc);
> + trace_nfs_writeback_folio_reclaim_done(inode, range_start, len,
> + ret);
> + } else
> + nfs_commit_inode(inode, 0);
Maybe init ret = 0, or return the value from
nfs_commit_inode() for tracing.
> + return ret;
> +}
> +
> /**
> * nfs_wb_folio - Write back all requests on one page
> * @inode: pointer to page
> diff --git a/include/linux/nfs_fs.h b/include/linux/nfs_fs.h
> index a6624edb7226..8dd79a3f3d66 100644
> --- a/include/linux/nfs_fs.h
> +++ b/include/linux/nfs_fs.h
> @@ -637,6 +637,7 @@ extern int nfs_update_folio(struct file *file, struct folio *folio,
> extern int nfs_sync_inode(struct inode *inode);
> extern int nfs_wb_all(struct inode *inode);
> extern int nfs_wb_folio(struct inode *inode, struct folio *folio);
> +extern int nfs_wb_folio_reclaim(struct inode *inode, struct folio *folio);
> int nfs_wb_folio_cancel(struct inode *inode, struct folio *folio);
> extern int nfs_commit_inode(struct inode *, int);
> extern struct nfs_commit_data *nfs_commitdata_alloc(void);
Happy to test this with my reproducer once you post it.
Thanks,
Wang Zhaolong
Powered by blists - more mailing lists