linux-kernel - Re: [PATCH RFC] fhandle: expose u64 mount id to name_to_handle

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20240524.154429-smoked.node.sleepy.dragster-w2EokFBsl7RC@cyphar.com>
Date: Fri, 24 May 2024 08:58:55 -0700
From: Aleksa Sarai <cyphar@...har.com>
To: Christian Brauner <brauner@...nel.org>
Cc: Alexander Viro <viro@...iv.linux.org.uk>, Jan Kara <jack@...e.cz>, 
	Chuck Lever <chuck.lever@...cle.com>, Jeff Layton <jlayton@...nel.org>, 
	Amir Goldstein <amir73il@...il.com>, Alexander Aring <alex.aring@...il.com>, 
	linux-fsdevel@...r.kernel.org, linux-nfs@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH RFC] fhandle: expose u64 mount id to name_to_handle_at(2)

On 2024-05-24, Christian Brauner <brauner@...nel.org> wrote:
> On Thu, May 23, 2024 at 09:52:20AM -0600, Aleksa Sarai wrote:
> > On 2024-05-21, Christian Brauner <brauner@...nel.org> wrote:
> > > On Mon, May 20, 2024 at 05:35:49PM -0400, Aleksa Sarai wrote:
> > > > Now that we have stabilised the unique 64-bit mount ID interface in
> > > > statx, we can now provide a race-free way for name_to_handle_at(2) to
> > > > provide a file handle and corresponding mount without needing to worry
> > > > about racing with /proc/mountinfo parsing.
> > > > 
> > > > As with AT_HANDLE_FID, AT_HANDLE_UNIQUE_MNT_ID reuses a statx AT_* bit
> > > > that doesn't make sense for name_to_handle_at(2).
> > > > 
> > > > Signed-off-by: Aleksa Sarai <cyphar@...har.com>
> > > > ---
> > > 
> > > So I think overall this is probably fine (famous last words). If it's
> > > just about being able to retrieve the new mount id without having to
> > > take the hit of another statx system call it's indeed a bit much to
> > > add a revised system call for this. Althoug I did say earlier that I
> > > wouldn't rule that out.
> > > 
> > > But if we'd that then it'll be a long discussion on the form of the new
> > > system call and the information it exposes.
> > > 
> > > For example, I lack the grey hair needed to understand why
> > > name_to_handle_at() returns a mount id at all. The pitch in commit
> > > 990d6c2d7aee ("vfs: Add name to file handle conversion support") is that
> > > the (old) mount id can be used to "lookup file system specific
> > > information [...] in /proc/<pid>/mountinfo".
> > 
> > The logic was presumably to allow you to know what mount the resolved
> > file handle came from. If you use AT_EMPTY_PATH this is not needed
> > because you could just fstatfs (and now statx(AT_EMPTY_PATH)), but if
> > you just give name_to_handle_at() almost any path, there is no race-free
> > way to make sure that you know which filesystem the file handle came
> > from.
> > 
> > I don't know if that could lead to security issues (I guess an attacker
> > could find a way to try to manipulate the file handle you get back, and
> > then try to trick you into operating on the wrong filesystem with
> > open_by_handle_at()) but it is definitely something you'd want to avoid.
> 
> So the following paragraphs are prefaced with: I'm not an expert on file
> handle encoding and so might be totally wrong.
> 
> Afaiu, the uniqueness guarantee of the file handle mostly depends on:
> 
> (1) the filesystem
> (2) the actual file handling encoding
> 
> Looking at file handle encoding to me it looks like it's fairly easy to
> fake them in userspace (I guess that's ok if you think about them like a
> path but with a weird permission model built around them.) for quite a
> few filesystems.

The old Docker breakout attack did brute-force the fhandle for the root
directory of the host filesystem, so it is entirely possible.

However, the attack I was thinking of was whether a directory tree that
an attacker had mount permissions over could be manipulated such that a
privileged process doing name_to_handle_at() on a path within the tree
would get a file handle that open_by_handle_at() on a different
filesystem would result in a potentially dangerous path being opened.

For instance (M is management process, C is the malicious container
process):

 C: Bind-mounts the root of the container filesystem at /foo.
 M: name_to_handle_at($CONTAINER/foo)
     -> gets an fhandle of / of the container filesystem
     -> stores a copy of the (recycled) mount id
 C: Swaps /foo with a bind-mount of the host root filesystem such that
    the mount id is recycled, before M can scan /proc/self/mountinfo.
 C: Triggers M to try to use the filehandle for some administrative
    process.
 M: open_by_handle_at(...) on the wrong mount id, getting an fd of the
    host / directory. Possibly does something bad to this directory
    (deleting files, passing the fd back to the container, etc).

It seems possible that this could happen, so having a unique mount id is
kind of important if you plan to use open_by_handle_at() with malicious
actors in control of the target filesystem tree.

Though, regardless of the attack you are worried about, I guess we are
in agreement that a unique mount id from name_to_handle_at() would be a
good idea if we are planning for userspace to use file handles for
everything.

> For example, for anything that uses fs/libfs.c:generic_encode_ino32_fh()
> it's easy to construct a file handle by retrieving the inode number via
> stat and the generation number via FS_IOC_GETVERSION.
> 
> Encoding using the inode number and the inode generation number is
> probably not very strong so it's not impossible to generate a file
> handle that is not unique without knowing in which filesystem to
> interpret it in.
> 
> The problem is with what name_to_handle_at() returns imho. A mnt_id
> doesn't pin the filesystem and the old mnt_id isn't unique. That means
> the filesystem can be unmounted and go away and the mnt_id can be
> recycled almost immediately for another mount but the file handle is
> still there.
> 
> So to guarantee that a (weakly encoded) file handle is interpreted in
> the right filesystem the file handle must either be accompanied by a
> file descriptor that pins the relevant mount or have any other guarantee
> that the filesystem doesn't go away (Or of course, the file handle
> encodes the uuid of the filesystem or something or uses some sort of
> hashing scheme.).
> 
> One of the features of file handles is that they're globally usable so
> they're interesting to use as handles that can be shared. IOW, one can
> send around a file handle to another process without having to pin
> anything or have a file descriptor open that needs to be sent via
> AF_UNIX.
> 
> But as stated above that's potentially risky so one might still have to
> send around an fd together with the file handle if sender and receiver
> don't share the filesystem for the handle.
> 
> However, with the unique mount id things improve quite a bit. Because
> now it's possible to send around the unique mount id and the file
> handle. Then one can use statmount() to figure out which filesystem this
> file handle needs to be interpreted in.

-- 
Aleksa Sarai
Senior Software Engineer (Containers)
SUSE Linux GmbH
<https://www.cyphar.com/>

Download attachment "signature.asc" of type "application/pgp-signature" (229 bytes)