[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20241210A1712306692debd.njha@janestreet.com>
Date: Tue, 10 Dec 2024 12:12:30 -0500
From: Nikhil Jha <njha@...estreet.com>
To: Benjamin Coddington <bcodding@...hat.com>
Cc: Trond Myklebust <trondmy@...nel.org>, Anna Schumaker <anna@...nel.org>,
linux-nfs@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH] nfs: propagate fileid changed errors back to syscall
On Tue, Dec 10, 2024 at 07:11:43AM -0500, Benjamin Coddington wrote:
> On 9 Dec 2024, at 12:39, Nikhil Jha wrote:
>
> > Hello! This is the first kernel patch I have tried to upstream. I'm
> > following along with the kernel newbies guide but apologies if I got
> > anything wrong.
> >
> > Currently, if there is a mismatch in the request and response fileids in
> > an NFS request, the kernel logs an error and attempts to return ESTALE.
> > However, this error is currently dropped before it makes it all the way
> > to userspace. This appears to be a mistake, since as far as I can tell
> > that ESTALE value is never consumed from anywhere.
> >
> > Callstack for async NFS write, at time of error:
> >
> > nfs_update_inode <- returns -ESTALE
> > nfs_refresh_inode_locked
> > nfs_writeback_update_inode <- error is dropped here
> > nfs3_write_done
> > nfs_writeback_done
> > nfs_pgio_result <- other errors are collected here
> > rpc_exit_task
> > __rpc_execute
> > rpc_async_schedule
> > process_one_work
> > worker_thread
> > kthread
> > ret_from_fork
> >
> > We ran into this issue ourselves, and seeing the -ESTALE in the kernel
> > source code but not from userspace was surprising.
> >
> > I tested a rebased version of this patch on an el8 kernel (v6.1.114),
> > and it seems to correctly propagate this error.
> >
> >> 8------------------------------------------------------8<
> >
> > If an NFS server returns a response with a different file id to the
> > response, the kernel currently prints out an error and attempts to
> > return -ESTALE. However, this -ESTALE value is never surfaced anywhere.
>
> Hi Nikhil Jha,
>
> Will this cause us to return -ESTALE to the application even if the WRITE
> was successful?
>
> Ben
>
Hi Ben,
Hmm.. I'm not sure how to answer that question exactly. A fileid is only
mismatched between a request RPC and response RPC when something really
weird is going on (i.e. a bug in the NFS server), so it's hard to reason
about if a WRITE was successful or not.
The `return -ESTALE` was already in the kernel code, but this particular
codepath seems to have accidentally dropped that error.
Nikhil
Powered by blists - more mailing lists