linux-kernel - Re: [PATCH 2/2] NFS: Improve nfsiod workqueue detection for allocation flags

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <a93e72cfc812a117166c0b20e9cca4e5f8d43393.camel@redhat.com>
Date: Tue, 08 Jul 2025 12:50:38 -0400
From: Laurence Oberman <loberman@...hat.com>
To: Trond Myklebust <trondmy@...nel.org>, Benjamin Coddington	
 <bcodding@...hat.com>, Anna Schumaker <anna@...nel.org>, Tejun Heo
 <tj@...nel.org>,  Lai Jiangshan <jiangshanlai@...il.com>
Cc: linux-nfs@...r.kernel.org, linux-kernel@...r.kernel.org, 
	djeffery@...hat.com
Subject: Re: [PATCH 2/2] NFS: Improve nfsiod workqueue detection for
 allocation flags

On Mon, 2025-07-07 at 16:28 -0400, Laurence Oberman wrote:
> On Mon, 2025-07-07 at 12:25 -0700, Trond Myklebust wrote:
> > On Mon, 2025-07-07 at 14:46 -0400, Benjamin Coddington wrote:
> > > The NFS client writeback paths change which flags are passed to
> > > their
> > > memory allocation calls based on whether the current task is
> > > running
> > > from
> > > within a workqueue or not.  More specifically, it appears that
> > > during
> > > writeback allocations with PF_WQ_WORKER set on current->flags
> > > will
> > > add
> > > __GFP_NORETRY | __GFP_NOWARN.  Presumably this is because nfsiod
> > > can
> > > simply fail quickly and later retry to write back that specific
> > > page
> > > should
> > > the allocation fail.
> > > 
> > > However, the check for PF_WQ_WORKER is too general because tasks
> > > can
> > > enter NFS
> > > writeback paths from other workqueues.  Specifically, the
> > > loopback
> > > driver
> > > tends to perform writeback into backing files on NFS with
> > > PF_WQ_WORKER set,
> > > and additionally sets PF_MEMALLOC_NOIO.  The combination of
> > > PF_MEMALLOC_NOIO with __GFP_NORETRY can easily result in
> > > allocation
> > > failures and the loopback driver has no retry functionality.  As
> > > a
> > > result,
> > > after commit 0bae835b63c5 ("NFS: Avoid writeback threads getting
> > > stuck in
> > > mempool_alloc()") users are seeing corrupted loop-mounted
> > > filesystems
> > > backed
> > > by image files on NFS.
> > > 
> > > In a preceding patch, we introduced a function to allow NFS to
> > > detect
> > > if
> > > the task is executing within a specific workqueue.  Here we use
> > > that
> > > helper
> > > to set __GFP_NORETRY | __GFP_NOWARN only if the workqueue is
> > > nfsiod.
> > > 
> > > Fixes: 0bae835b63c5 ("NFS: Avoid writeback threads getting stuck
> > > in
> > > mempool_alloc()")
> > > Signed-off-by: Benjamin Coddington <bcodding@...hat.com>
> > > ---
> > >  fs/nfs/internal.h | 12 +++++++++++-
> > >  1 file changed, 11 insertions(+), 1 deletion(-)
> > > 
> > > diff --git a/fs/nfs/internal.h b/fs/nfs/internal.h
> > > index 69c2c10ee658..173172afa3f5 100644
> > > --- a/fs/nfs/internal.h
> > > +++ b/fs/nfs/internal.h
> > > @@ -12,6 +12,7 @@
> > >  #include <linux/nfs_page.h>
> > >  #include <linux/nfslocalio.h>
> > >  #include <linux/wait_bit.h>
> > > +#include <linux/workqueue.h>
> > >  
> > >  #define NFS_SB_MASK
> > > (SB_NOSUID|SB_NODEV|SB_NOEXEC|SB_SYNCHRONOUS)
> > >  
> > > @@ -669,9 +670,18 @@ nfs_write_match_verf(const struct
> > > nfs_writeverf
> > > *verf,
> > >  		!nfs_write_verifier_cmp(&req->wb_verf, &verf-
> > > > verifier);
> > >  }
> > >  
> > > +static inline bool is_nfsiod(void)
> > > +{
> > > +	struct workqueue_struct *current_wq =
> > > current_workqueue();
> > > +
> > > +	if (current_wq)
> > > +		return current_wq == nfsiod_workqueue;
> > > +	return false;
> > > +}
> > > +
> > >  static inline gfp_t nfs_io_gfp_mask(void)
> > >  {
> > > -	if (current->flags & PF_WQ_WORKER)
> > > +	if (is_nfsiod())
> > >  		return GFP_KERNEL | __GFP_NORETRY |
> > > __GFP_NOWARN;
> > >  	return GFP_KERNEL;
> > >  }
> > 
> > 
> > Instead of trying to identify the nfsiod_workqueue, why not apply
> > current_gfp_context() in order to weed out callers that set
> > PF_MEMALLOC_NOIO and PF_MEMALLOC_NOFS?
> > 
> > i.e.
> > 
> > 
> > static inline gfp_t nfs_io_gfp_mask(void)
> > {
> > 	gfp_t ret = current_gfp_context(GFP_KERNEL);
> > 
> > 	if ((current->flags & PF_WQ_WORKER) && ret == GFP_KERNEL)
> > 		ret |= __GFP_NORETRY | __GFP_NOWARN;
> > 	return ret;
> > }
> > 
> > 
> 
> I am testing both patch options to see if both prevent the failed
> write
> with no other impact and will report back.
> 
> The test is confined to the use case of an XFS file system served by
> an
> image that is located on NFS. as that is where the failed writes were
> seen.
> 
> 
> 


Both Ben's patch and Trond's fix the failing write issue so I guess we
need to decide what the final fix will be.

For both solutions
Tested-by: Laurence Oberman <loberman@...hat.com>