[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <a93e72cfc812a117166c0b20e9cca4e5f8d43393.camel@redhat.com>
Date: Tue, 08 Jul 2025 12:50:38 -0400
From: Laurence Oberman <loberman@...hat.com>
To: Trond Myklebust <trondmy@...nel.org>, Benjamin Coddington
<bcodding@...hat.com>, Anna Schumaker <anna@...nel.org>, Tejun Heo
<tj@...nel.org>, Lai Jiangshan <jiangshanlai@...il.com>
Cc: linux-nfs@...r.kernel.org, linux-kernel@...r.kernel.org,
djeffery@...hat.com
Subject: Re: [PATCH 2/2] NFS: Improve nfsiod workqueue detection for
allocation flags
On Mon, 2025-07-07 at 16:28 -0400, Laurence Oberman wrote:
> On Mon, 2025-07-07 at 12:25 -0700, Trond Myklebust wrote:
> > On Mon, 2025-07-07 at 14:46 -0400, Benjamin Coddington wrote:
> > > The NFS client writeback paths change which flags are passed to
> > > their
> > > memory allocation calls based on whether the current task is
> > > running
> > > from
> > > within a workqueue or not. More specifically, it appears that
> > > during
> > > writeback allocations with PF_WQ_WORKER set on current->flags
> > > will
> > > add
> > > __GFP_NORETRY | __GFP_NOWARN. Presumably this is because nfsiod
> > > can
> > > simply fail quickly and later retry to write back that specific
> > > page
> > > should
> > > the allocation fail.
> > >
> > > However, the check for PF_WQ_WORKER is too general because tasks
> > > can
> > > enter NFS
> > > writeback paths from other workqueues. Specifically, the
> > > loopback
> > > driver
> > > tends to perform writeback into backing files on NFS with
> > > PF_WQ_WORKER set,
> > > and additionally sets PF_MEMALLOC_NOIO. The combination of
> > > PF_MEMALLOC_NOIO with __GFP_NORETRY can easily result in
> > > allocation
> > > failures and the loopback driver has no retry functionality. As
> > > a
> > > result,
> > > after commit 0bae835b63c5 ("NFS: Avoid writeback threads getting
> > > stuck in
> > > mempool_alloc()") users are seeing corrupted loop-mounted
> > > filesystems
> > > backed
> > > by image files on NFS.
> > >
> > > In a preceding patch, we introduced a function to allow NFS to
> > > detect
> > > if
> > > the task is executing within a specific workqueue. Here we use
> > > that
> > > helper
> > > to set __GFP_NORETRY | __GFP_NOWARN only if the workqueue is
> > > nfsiod.
> > >
> > > Fixes: 0bae835b63c5 ("NFS: Avoid writeback threads getting stuck
> > > in
> > > mempool_alloc()")
> > > Signed-off-by: Benjamin Coddington <bcodding@...hat.com>
> > > ---
> > > fs/nfs/internal.h | 12 +++++++++++-
> > > 1 file changed, 11 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/fs/nfs/internal.h b/fs/nfs/internal.h
> > > index 69c2c10ee658..173172afa3f5 100644
> > > --- a/fs/nfs/internal.h
> > > +++ b/fs/nfs/internal.h
> > > @@ -12,6 +12,7 @@
> > > #include <linux/nfs_page.h>
> > > #include <linux/nfslocalio.h>
> > > #include <linux/wait_bit.h>
> > > +#include <linux/workqueue.h>
> > >
> > > #define NFS_SB_MASK
> > > (SB_NOSUID|SB_NODEV|SB_NOEXEC|SB_SYNCHRONOUS)
> > >
> > > @@ -669,9 +670,18 @@ nfs_write_match_verf(const struct
> > > nfs_writeverf
> > > *verf,
> > > !nfs_write_verifier_cmp(&req->wb_verf, &verf-
> > > > verifier);
> > > }
> > >
> > > +static inline bool is_nfsiod(void)
> > > +{
> > > + struct workqueue_struct *current_wq =
> > > current_workqueue();
> > > +
> > > + if (current_wq)
> > > + return current_wq == nfsiod_workqueue;
> > > + return false;
> > > +}
> > > +
> > > static inline gfp_t nfs_io_gfp_mask(void)
> > > {
> > > - if (current->flags & PF_WQ_WORKER)
> > > + if (is_nfsiod())
> > > return GFP_KERNEL | __GFP_NORETRY |
> > > __GFP_NOWARN;
> > > return GFP_KERNEL;
> > > }
> >
> >
> > Instead of trying to identify the nfsiod_workqueue, why not apply
> > current_gfp_context() in order to weed out callers that set
> > PF_MEMALLOC_NOIO and PF_MEMALLOC_NOFS?
> >
> > i.e.
> >
> >
> > static inline gfp_t nfs_io_gfp_mask(void)
> > {
> > gfp_t ret = current_gfp_context(GFP_KERNEL);
> >
> > if ((current->flags & PF_WQ_WORKER) && ret == GFP_KERNEL)
> > ret |= __GFP_NORETRY | __GFP_NOWARN;
> > return ret;
> > }
> >
> >
>
> I am testing both patch options to see if both prevent the failed
> write
> with no other impact and will report back.
>
> The test is confined to the use case of an XFS file system served by
> an
> image that is located on NFS. as that is where the failed writes were
> seen.
>
>
>
Both Ben's patch and Trond's fix the failing write issue so I guess we
need to decide what the final fix will be.
For both solutions
Tested-by: Laurence Oberman <loberman@...hat.com>
Powered by blists - more mailing lists