[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ZlPOd0p7AUn7JqLu@dread.disaster.area>
Date: Mon, 27 May 2024 10:06:15 +1000
From: Dave Chinner <david@...morbit.com>
To: Christian Brauner <brauner@...nel.org>
Cc: Aleksa Sarai <cyphar@...har.com>,
Alexander Viro <viro@...iv.linux.org.uk>, Jan Kara <jack@...e.cz>,
Chuck Lever <chuck.lever@...cle.com>,
Jeff Layton <jlayton@...nel.org>,
Amir Goldstein <amir73il@...il.com>,
Alexander Aring <alex.aring@...il.com>,
linux-fsdevel@...r.kernel.org, linux-nfs@...r.kernel.org,
linux-kernel@...r.kernel.org
Subject: Re: [PATCH RFC] fhandle: expose u64 mount id to name_to_handle_at(2)
On Fri, May 24, 2024 at 02:25:30PM +0200, Christian Brauner wrote:
> On Thu, May 23, 2024 at 09:52:20AM -0600, Aleksa Sarai wrote:
> > On 2024-05-21, Christian Brauner <brauner@...nel.org> wrote:
> > > On Mon, May 20, 2024 at 05:35:49PM -0400, Aleksa Sarai wrote:
> > > > Now that we have stabilised the unique 64-bit mount ID interface in
> > > > statx, we can now provide a race-free way for name_to_handle_at(2) to
> > > > provide a file handle and corresponding mount without needing to worry
> > > > about racing with /proc/mountinfo parsing.
> > > >
> > > > As with AT_HANDLE_FID, AT_HANDLE_UNIQUE_MNT_ID reuses a statx AT_* bit
> > > > that doesn't make sense for name_to_handle_at(2).
> > > >
> > > > Signed-off-by: Aleksa Sarai <cyphar@...har.com>
> > > > ---
> > >
> > > So I think overall this is probably fine (famous last words). If it's
> > > just about being able to retrieve the new mount id without having to
> > > take the hit of another statx system call it's indeed a bit much to
> > > add a revised system call for this. Althoug I did say earlier that I
> > > wouldn't rule that out.
> > >
> > > But if we'd that then it'll be a long discussion on the form of the new
> > > system call and the information it exposes.
> > >
> > > For example, I lack the grey hair needed to understand why
> > > name_to_handle_at() returns a mount id at all. The pitch in commit
> > > 990d6c2d7aee ("vfs: Add name to file handle conversion support") is that
> > > the (old) mount id can be used to "lookup file system specific
> > > information [...] in /proc/<pid>/mountinfo".
> >
> > The logic was presumably to allow you to know what mount the resolved
> > file handle came from. If you use AT_EMPTY_PATH this is not needed
> > because you could just fstatfs (and now statx(AT_EMPTY_PATH)), but if
> > you just give name_to_handle_at() almost any path, there is no race-free
> > way to make sure that you know which filesystem the file handle came
> > from.
> >
> > I don't know if that could lead to security issues (I guess an attacker
> > could find a way to try to manipulate the file handle you get back, and
> > then try to trick you into operating on the wrong filesystem with
> > open_by_handle_at()) but it is definitely something you'd want to avoid.
>
> So the following paragraphs are prefaced with: I'm not an expert on file
> handle encoding and so might be totally wrong.
>
> Afaiu, the uniqueness guarantee of the file handle mostly depends on:
>
> (1) the filesystem
> (2) the actual file handling encoding
>
> Looking at file handle encoding to me it looks like it's fairly easy to
> fake them in userspace (I guess that's ok if you think about them like a
> path but with a weird permission model built around them.) for quite a
> few filesystems.
This is a feature specifically used by XFS utilities like xfs_fsr
and xfsdump to take pathless inode information retrieved by bulkstat
operations to a file handle to enable arbitrary inodes in the
filesystem to be opened.
i.e. they scan the filesystem by walking the filesystem's internal
inode index, not by walking paths to scan the filesytsem. This is
*much* faster than path walking, especially on seek-limited storage
hardware.
Knowing the inode number, generation and fs uuid, we can
construct a valid filehandle for that inode, and then do whatever
operations we need to do (defrag, backup, move offline (HSM), etc)
without needing to know the path to that inode....
> The problem is with what name_to_handle_at() returns imho. A mnt_id
> doesn't pin the filesystem and the old mnt_id isn't unique. That means
> the filesystem can be unmounted and go away and the mnt_id can be
> recycled almost immediately for another mount but the file handle is
> still there.
This is no different to the NFS server unexporting a filesystem -
all attempts to resolve the file handle the client holds for that
export must now fail. Hence the filehandle itself must have a
superblock specific identifier in it.
> So to guarantee that a (weakly encoded) file handle is interpreted in
> the right filesystem the file handle must either be accompanied by a
> file descriptor that pins the relevant mount or have any other guarantee
> that the filesystem doesn't go away (Or of course, the file handle
> encodes the uuid of the filesystem or something or uses some sort of
> hashing scheme.).
Yes, it does, and that's the superblock based identifier, not
something that is mount based. i.e. the handle is valid regardless
of where the filesystem is mounted, even if the application does not
have visibility or permitted access to the given mount. And the
filehandle is persistent - it must work across unmount/mount,
reboots, change of mount location, etc.
That means that if you are using file handles, any filesystem that
is shared across multiple containers can by fully accessed by user
generated file handles from any container if no special permissions
are required. i.e. there are no real path or mount based security
boundaries for filehandles in existence righ now.
If filehandles are going to be restricted to a specific container
(e.g. a single mount) so that less permissions are needed to open
the filehandle, then the filehandle itself needs to encode those
restrictions. Decoding the filehandle needs to ensure that the
opening context has permission to access whatever the filehandle
points to in a restricted environment. This will prevent existing
persistent, global filehandles from being used as restricted
filehandles and vice versa.
Restricting filehandles for use as persistent filehandles is going
to be tricky - mount IDs are ephermeral and only valid while a mount
is active. I'd suggest that restricted filehandles should only be
ephemeral, too. That would also allow restricted filehandles to use
kernel side crypto based encoding so nobody can ever construct them
from userspace.
Hence I think we have to look at what we are encoding in the
filehandle itself to solve the "shared access from restricted
environments" problem, not try to add stuff around the outside of
the filehandle API to paper over the fact that filehandles
themselves have no permission/restriction model built into them.
This would also avoid the whole problem of "users can access any
file in the underlying filesystem by constructing their own handles" problem that
relaxed permissions on filehandle decoding creates, hence opening
the door for more extensive use of filehandles in untrusted
environments.
> One of the features of file handles is that they're globally usable so
> they're interesting to use as handles that can be shared. IOW, one can
> send around a file handle to another process without having to pin
> anything or have a file descriptor open that needs to be sent via
> AF_UNIX.
>
> But as stated above that's potentially risky so one might still have to
> send around an fd together with the file handle if sender and receiver
> don't share the filesystem for the handle.
Adding a trust context around the outside of an untrusted handle
isn't the right way to address this problem - define a filehandle
format that can be trusted and constrained within specific security
domains and the need for external permission contexts (whatever they
look like) to be passed with the filehandle to make it valid go away
entirely.
-Dave.
--
Dave Chinner
david@...morbit.com
Powered by blists - more mailing lists