linux-kernel - Re: [PATCH] [RFC] NFSv4.1: slot table draining + memory reclaim can deadlock state manager creation

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <fcfbf7c0-f65f-4bf8-a702-4dafc9c094c9@huaweicloud.com>
Date: Tue, 6 Jan 2026 12:11:10 +0800
From: Wang Zhaolong <wangzhaolong@...weicloud.com>
To: Trond Myklebust <trondmy@...nel.org>, anna@...nel.org, kolga@...app.com
Cc: linux-nfs@...r.kernel.org, linux-kernel@...r.kernel.org,
 yi.zhang@...wei.com, yangerkun@...wei.com, chengzhihao1@...wei.com,
 lilingfeng3@...wei.com, zhangjian496@...wei.com
Subject: Re: [PATCH] [RFC] NFSv4.1: slot table draining + memory reclaim can
 deadlock state manager creation

Hi Trond,

Thanks a lot for taking the time to look into this and for the patch proposal.

> 
> I think we need to treat nfs_release_folio() as being different from
> nfs_wb_folio() altogether. There are too many different ways in which
> waiting on I/O can cause deadlocks, including waiting on the writeback
> to complete. My suggestion is therefore that we just make this simple,
> and see it as a hint that we should kick off either writeback or a
> commit, but not that we should be waiting for it to complete.
> So how about the following?
> 

I agree with the overall direction.Avoiding any synchronous waits from
->release_folio() makes a lot of sense

> 8<------------------------------------------------------------------
>  From 3533434037066b610d50e7bd36f3525ace296928 Mon Sep 17 00:00:00 2001
> Message-ID: <3533434037066b610d50e7bd36f3525ace296928.1767204181.git.trond.myklebust@...merspace.com>
> From: Trond Myklebust <trond.myklebust@...merspace.com>
> Date: Wed, 31 Dec 2025 11:42:31 -0500
> Subject: [PATCH] NFS: Fix a deadlock involving nfs_release_folio()
> 
> Wang Zhaolong reports a deadlock involving NFSv4.1 state recovery
> waiting on kthreadd, which is attempting to reclaim memory by calling
> nfs_release_folio(). The latter cannot make progress due to state
> recovery being needed.
> 
> It seems that the only safe thing to do here is to kick off a writeback
> of the folio, without waiting for completion, or else kicking off an
> asynchronous commit.
> 
> Reported-by: Wang Zhaolong <wangzhaolong@...weicloud.com>
> Fixes: 96780ca55e3c ("NFS: fix up nfs_release_folio() to try to release the page")
> Signed-off-by: Trond Myklebust <trond.myklebust@...merspace.com>
> ---
>   fs/nfs/file.c          |  3 ++-
>   fs/nfs/nfstrace.h      |  3 +++
>   fs/nfs/write.c         | 32 ++++++++++++++++++++++++++++++++
>   include/linux/nfs_fs.h |  1 +
>   4 files changed, 38 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/nfs/file.c b/fs/nfs/file.c
> index d020aab40c64..d1c138a416cf 100644
> --- a/fs/nfs/file.c
> +++ b/fs/nfs/file.c
> @@ -511,7 +511,8 @@ static bool nfs_release_folio(struct folio *folio, gfp_t gfp)
>   		if ((current_gfp_context(gfp) & GFP_KERNEL) != GFP_KERNEL ||
>   		    current_is_kswapd() || current_is_kcompactd())
>   			return false;
> -		if (nfs_wb_folio(folio->mapping->host, folio) < 0)
> +		if (nfs_wb_folio_reclaim(folio->mapping->host, folio) < 0 ||
> +		    folio_test_private(folio))
>   			return false;
>   	}
>   	return nfs_fscache_release_folio(folio, gfp);
> diff --git a/fs/nfs/nfstrace.h b/fs/nfs/nfstrace.h
> index 6ce55e8e6b67..9f9ce4a565ea 100644
> --- a/fs/nfs/nfstrace.h
> +++ b/fs/nfs/nfstrace.h
> @@ -1062,6 +1062,9 @@ DECLARE_EVENT_CLASS(nfs_folio_event_done,
>   DEFINE_NFS_FOLIO_EVENT(nfs_aop_readpage);
>   DEFINE_NFS_FOLIO_EVENT_DONE(nfs_aop_readpage_done);
>   
> +DEFINE_NFS_FOLIO_EVENT(nfs_writeback_folio_reclaim);
> +DEFINE_NFS_FOLIO_EVENT_DONE(nfs_writeback_folio_reclaim_done);
> +
>   DEFINE_NFS_FOLIO_EVENT(nfs_writeback_folio);
>   DEFINE_NFS_FOLIO_EVENT_DONE(nfs_writeback_folio_done);
>   
> diff --git a/fs/nfs/write.c b/fs/nfs/write.c
> index 336c510f3750..5de9ec6c72a2 100644
> --- a/fs/nfs/write.c
> +++ b/fs/nfs/write.c
> @@ -2024,6 +2024,38 @@ int nfs_wb_folio_cancel(struct inode *inode, struct folio *folio)
>   	return ret;
>   }
>   
> +/**
> + * nfs_wb_folio_reclaim - Write back all requests on one page
> + * @inode: pointer to page
> + * @folio: pointer to folio
> + *
> + * Assumes that the folio has been locked by the caller
> + */
> +int nfs_wb_folio_reclaim(struct inode *inode, struct folio *folio)
> +{
> +	loff_t range_start = folio_pos(folio);
> +	size_t len = folio_size(folio);
> +	struct writeback_control wbc = {
> +		.sync_mode = WB_SYNC_ALL,
> +		.nr_to_write = 0,
> +		.range_start = range_start,
> +		.range_end = range_start + len - 1,
> +		.for_sync = 1,
> +	};
> +	int ret;

One small thing: in nfs_wb_folio_reclaim(), the else
nfs_commit_inode(inode, 0); branch seems to return an
uninitialized ret (unless I missed something).

> +
> +	if (folio_test_writeback(folio))
> +		return -EBUSY;
> +	if (folio_clear_dirty_for_io(folio)) {
> +		trace_nfs_writeback_folio_reclaim(inode, range_start, len);
> +		ret = nfs_writepage_locked(folio, &wbc);
> +		trace_nfs_writeback_folio_reclaim_done(inode, range_start, len,
> +						       ret);
> +	} else
> +		nfs_commit_inode(inode, 0);

Maybe init ret = 0, or return the value from
nfs_commit_inode() for tracing.

> +	return ret;
> +}
> +
>   /**
>    * nfs_wb_folio - Write back all requests on one page
>    * @inode: pointer to page
> diff --git a/include/linux/nfs_fs.h b/include/linux/nfs_fs.h
> index a6624edb7226..8dd79a3f3d66 100644
> --- a/include/linux/nfs_fs.h
> +++ b/include/linux/nfs_fs.h
> @@ -637,6 +637,7 @@ extern int  nfs_update_folio(struct file *file, struct folio *folio,
>   extern int nfs_sync_inode(struct inode *inode);
>   extern int nfs_wb_all(struct inode *inode);
>   extern int nfs_wb_folio(struct inode *inode, struct folio *folio);
> +extern int nfs_wb_folio_reclaim(struct inode *inode, struct folio *folio);
>   int nfs_wb_folio_cancel(struct inode *inode, struct folio *folio);
>   extern int  nfs_commit_inode(struct inode *, int);
>   extern struct nfs_commit_data *nfs_commitdata_alloc(void);

Happy to test this with my reproducer once you post it.

Thanks,
Wang Zhaolong