[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20240527-paare-holzhammer-8c3a32eaf6b1@brauner>
Date: Mon, 27 May 2024 15:39:55 +0200
From: Christian Brauner <brauner@...nel.org>
To: Dave Chinner <david@...morbit.com>
Cc: Aleksa Sarai <cyphar@...har.com>, 
	Alexander Viro <viro@...iv.linux.org.uk>, Jan Kara <jack@...e.cz>, Chuck Lever <chuck.lever@...cle.com>, 
	Jeff Layton <jlayton@...nel.org>, Amir Goldstein <amir73il@...il.com>, 
	Alexander Aring <alex.aring@...il.com>, linux-fsdevel@...r.kernel.org, linux-nfs@...r.kernel.org, 
	linux-kernel@...r.kernel.org
Subject: Re: [PATCH RFC] fhandle: expose u64 mount id to name_to_handle_at(2)
On Mon, May 27, 2024 at 10:06:15AM +1000, Dave Chinner wrote:
> On Fri, May 24, 2024 at 02:25:30PM +0200, Christian Brauner wrote:
> > On Thu, May 23, 2024 at 09:52:20AM -0600, Aleksa Sarai wrote:
> > > On 2024-05-21, Christian Brauner <brauner@...nel.org> wrote:
> > > > On Mon, May 20, 2024 at 05:35:49PM -0400, Aleksa Sarai wrote:
> > > > > Now that we have stabilised the unique 64-bit mount ID interface in
> > > > > statx, we can now provide a race-free way for name_to_handle_at(2) to
> > > > > provide a file handle and corresponding mount without needing to worry
> > > > > about racing with /proc/mountinfo parsing.
> > > > > 
> > > > > As with AT_HANDLE_FID, AT_HANDLE_UNIQUE_MNT_ID reuses a statx AT_* bit
> > > > > that doesn't make sense for name_to_handle_at(2).
> > > > > 
> > > > > Signed-off-by: Aleksa Sarai <cyphar@...har.com>
> > > > > ---
> > > > 
> > > > So I think overall this is probably fine (famous last words). If it's
> > > > just about being able to retrieve the new mount id without having to
> > > > take the hit of another statx system call it's indeed a bit much to
> > > > add a revised system call for this. Althoug I did say earlier that I
> > > > wouldn't rule that out.
> > > > 
> > > > But if we'd that then it'll be a long discussion on the form of the new
> > > > system call and the information it exposes.
> > > > 
> > > > For example, I lack the grey hair needed to understand why
> > > > name_to_handle_at() returns a mount id at all. The pitch in commit
> > > > 990d6c2d7aee ("vfs: Add name to file handle conversion support") is that
> > > > the (old) mount id can be used to "lookup file system specific
> > > > information [...] in /proc/<pid>/mountinfo".
> > > 
> > > The logic was presumably to allow you to know what mount the resolved
> > > file handle came from. If you use AT_EMPTY_PATH this is not needed
> > > because you could just fstatfs (and now statx(AT_EMPTY_PATH)), but if
> > > you just give name_to_handle_at() almost any path, there is no race-free
> > > way to make sure that you know which filesystem the file handle came
> > > from.
> > > 
> > > I don't know if that could lead to security issues (I guess an attacker
> > > could find a way to try to manipulate the file handle you get back, and
> > > then try to trick you into operating on the wrong filesystem with
> > > open_by_handle_at()) but it is definitely something you'd want to avoid.
> > 
> > So the following paragraphs are prefaced with: I'm not an expert on file
> > handle encoding and so might be totally wrong.
> > 
> > Afaiu, the uniqueness guarantee of the file handle mostly depends on:
> > 
> > (1) the filesystem
> > (2) the actual file handling encoding
> > 
> > Looking at file handle encoding to me it looks like it's fairly easy to
> > fake them in userspace (I guess that's ok if you think about them like a
> > path but with a weird permission model built around them.) for quite a
> > few filesystems.
> 
> This is a feature specifically used by XFS utilities like xfs_fsr
> and xfsdump to take pathless inode information retrieved by bulkstat
> operations to a file handle to enable arbitrary inodes in the
> filesystem to be opened.
> 
> i.e. they scan the filesystem by walking the filesystem's internal
> inode index, not by walking paths to scan the filesytsem. This is
> *much* faster than path walking, especially on seek-limited storage
> hardware.
> 
> Knowing the inode number, generation and fs uuid, we can
> construct a valid filehandle for that inode, and then do whatever
> operations we need to do (defrag, backup, move offline (HSM), etc)
> without needing to know the path to that inode....
Yeah, I think you mentioned this in another context before.
> > The problem is with what name_to_handle_at() returns imho. A mnt_id
> > doesn't pin the filesystem and the old mnt_id isn't unique. That means
> > the filesystem can be unmounted and go away and the mnt_id can be
> > recycled almost immediately for another mount but the file handle is
> > still there.
> 
> This is no different to the NFS server unexporting a filesystem -
> all attempts to resolve the file handle the client holds for that
> export must now fail. Hence the filehandle itself must have a
> superblock specific identifier in it.
> 
> > So to guarantee that a (weakly encoded) file handle is interpreted in
> > the right filesystem the file handle must either be accompanied by a
> > file descriptor that pins the relevant mount or have any other guarantee
> > that the filesystem doesn't go away (Or of course, the file handle
> > encodes the uuid of the filesystem or something or uses some sort of
> > hashing scheme.).
> 
> Yes, it does, and that's the superblock based identifier, not
> something that is mount based. i.e.  the handle is valid regardless
> of where the filesystem is mounted, even if the application does not
> have visibility or permitted access to the given mount. And the
> filehandle is persistent - it must work across unmount/mount,
> reboots, change of mount location, etc.
Agreed, and no one is disputing that. The old mount id as exposed by
name_to_handle_at() is one means to get to a persisent identifier and
that is racy. But we do have a 64bit non-recyled mount id and
statmount() since v6.7 now which allow to get a persistent identifier
for the filesystem in a race-free manner.
Powered by blists - more mailing lists
 
