lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <16e5f609599e29b49c81a1a8edd721a5daabc1bb.camel@hammerspace.com>
Date: Tue, 10 Dec 2024 17:33:14 +0000
From: Trond Myklebust <trondmy@...merspace.com>
To: "njha@...estreet.com" <njha@...estreet.com>, "bcodding@...hat.com"
	<bcodding@...hat.com>
CC: "anna@...nel.org" <anna@...nel.org>, "linux-nfs@...r.kernel.org"
	<linux-nfs@...r.kernel.org>, "linux-kernel@...r.kernel.org"
	<linux-kernel@...r.kernel.org>
Subject: Re: [PATCH] nfs: propagate fileid changed errors back to syscall

On Tue, 2024-12-10 at 12:12 -0500, Nikhil Jha wrote:
> On Tue, Dec 10, 2024 at 07:11:43AM -0500, Benjamin Coddington wrote:
> > On 9 Dec 2024, at 12:39, Nikhil Jha wrote:
> > 
> > > Hello! This is the first kernel patch I have tried to upstream.
> > > I'm
> > > following along with the kernel newbies guide but apologies if I
> > > got
> > > anything wrong.
> > > 
> > > Currently, if there is a mismatch in the request and response
> > > fileids in
> > > an NFS request, the kernel logs an error and attempts to return
> > > ESTALE.
> > > However, this error is currently dropped before it makes it all
> > > the way
> > > to userspace. This appears to be a mistake, since as far as I can
> > > tell
> > > that ESTALE value is never consumed from anywhere.
> > > 
> > > Callstack for async NFS write, at time of error:
> > > 
> > >         nfs_update_inode <- returns -ESTALE
> > >         nfs_refresh_inode_locked
> > >         nfs_writeback_update_inode <- error is dropped here
> > >         nfs3_write_done
> > >         nfs_writeback_done
> > >         nfs_pgio_result <- other errors are collected here
> > >         rpc_exit_task
> > >         __rpc_execute
> > >         rpc_async_schedule
> > >         process_one_work
> > >         worker_thread
> > >         kthread
> > >         ret_from_fork
> > > 
> > > We ran into this issue ourselves, and seeing the -ESTALE in the
> > > kernel
> > > source code but not from userspace was surprising.
> > > 
> > > I tested a rebased version of this patch on an el8 kernel
> > > (v6.1.114),
> > > and it seems to correctly propagate this error.
> > > 
> > > > 8------------------------------------------------------8<
> > > 
> > > If an NFS server returns a response with a different file id to
> > > the
> > > response, the kernel currently prints out an error and attempts
> > > to
> > > return -ESTALE. However, this -ESTALE value is never surfaced
> > > anywhere.
> > 
> > Hi Nikhil Jha,
> > 
> > Will this cause us to return -ESTALE to the application even if the
> > WRITE
> > was successful?
> > 
> > Ben
> > 
> 
> Hi Ben,
> 
> Hmm.. I'm not sure how to answer that question exactly. A fileid is
> only
> mismatched between a request RPC and response RPC when something
> really
> weird is going on (i.e. a bug in the NFS server), so it's hard to
> reason
> about if a WRITE was successful or not.
> 
> The `return -ESTALE` was already in the kernel code, but this
> particular
> codepath seems to have accidentally dropped that error.
> 
> Nikhil
> 

I'm not keen on taking any client changes to paper over what appears to
be a major server bug. We simply can't support broken servers that
reuse filehandles across files.

-- 
Trond Myklebust
Linux NFS client maintainer, Hammerspace
trond.myklebust@...merspace.com


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ