[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <37d39ff38dfef8bae73c5bc6a8593784bb2d3774.camel@kernel.org>
Date: Thu, 04 Dec 2025 09:11:24 -0500
From: Jeff Layton <jlayton@...nel.org>
To: Bhavik Sachdev <b.sachdev1904@...il.com>, Alexander Viro
<viro@...iv.linux.org.uk>, Christian Brauner <brauner@...nel.org>, Shuah
Khan <shuah@...nel.org>
Cc: linux-fsdevel@...r.kernel.org, linux-kernel@...r.kernel.org,
linux-kselftest@...r.kernel.org, criu@...ts.linux.dev, Jan Kara
<jack@...e.cz>, Aleksa Sarai <cyphar@...har.com>, Miklos Szeredi
<miklos@...redi.hu>, Pavel Tikhomirov <ptikhomirov@...tuozzo.com>, Andrei
Vagin <avagin@...il.com>, Alexander Mikhalitsyn <alexander@...alicyn.com>,
John Hubbard <jhubbard@...dia.com>, Amir Goldstein <amir73il@...il.com>,
"Martin K . Petersen" <martin.petersen@...cle.com>, Andrew Donnellan
<ajd@...ux.ibm.com>
Subject: Re: [PATCH v7 0/3] statmount: accept fd as a parameter
On Sat, 2025-11-29 at 14:41 +0530, Bhavik Sachdev wrote:
> We would like to add support for checkpoint/restoring file descriptors
> open on these "unmounted" mounts to CRIU (Checkpoint/Restore in
> Userspace) [1].
>
> Currently, we have no way to get mount info for these "unmounted" mounts
> since they do appear in /proc/<pid>/mountinfo and statmount does not
> work on them, since they do not belong to any mount namespace.
>
> This patch helps us by providing a way to get mountinfo for these
> "unmounted" mounts by using a fd on the mount.
>
> Changes from v6 [2] to v7:
> * Add kselftests for STATMOUNT_BY_FD flag.
>
> * Instead of renaming mnt_id_req.mnt_ns_fd to mnt_id_req.fd introduce a
> union so struct mnt_id_req looks like this:
>
> struct mnt_id_req {
> __u32 size;
> union {
> __u32 mnt_ns_fd;
> __u32 mnt_fd;
> };
> __u64 mnt_id;
> __u64 param;
> __u64 mnt_ns_id;
> };
>
> * In case of STATMOUNT_BY_FD grab mnt_ns inside of do_statmount(),
> since we get mnt_ns from mnt, which should happen under namespace lock.
>
> * Remove the modifications made to grab_requested_mnt_ns, those were
> never needed.
>
> Changes from v5 [3] to v6:
> * Instead of returning "[unmounted]" as the mount point for "unmounted"
> mounts, we unset the STATMOUNT_MNT_POINT flag in statmount.mask.
>
> * Instead of returning 0 as the mnt_ns_id for "unmounted" mounts, we
> unset the STATMOUNT_MNT_NS_ID flag in statmount.mask.
>
> * Added comment in `do_statmount` clarifying that the caller sets s->mnt
> in case of STATMOUNT_BY_FD.
>
> * In `do_statmount` move the mnt_ns_id and mnt_ns_empty() check just
> before lookup_mnt_in_ns().
>
> * We took another look at the capability checks for getting information
> for "unmounted" mounts using an fd and decided to remove them for the
> following reasons:
>
> - All fs related information is available via fstatfs() without any
> capability check.
>
> - Mount information is also available via /proc/pid/mountinfo (without
> any capability check).
>
> - Given that we have access to a fd on the mount which tells us that
> we had access to the mount at some point (or someone that had access
> gave us the fd). So, we should be able to access mount info.
>
> Changes from v4 [4] to v5:
> Check only for s->root.mnt to be NULL instead of checking for both
> s->root.mnt and s->root.dentry (I did not find a case where only one of
> them would be NULL).
>
> * Only allow system root (CAP_SYS_ADMIN in init_user_ns) to call
> statmount() on fd's on "unmounted" mounts. We (mostly Pavel) spent some
> time thinking about how our previous approach (of checking the opener's
> file credentials) caused problems.
>
> Please take a look at the linked pictures they describe everything more
> clearly.
>
> Case 1: A fd is on a normal mount (Link to Picture: [5])
> Consider, a situation where we have two processes P1 and P2 and a file
> F1. F1 is opened on mount ns M1 by P1. P1 is nested inside user
> namespace U1 and U2. P2 is also in U1. P2 is also in a pid namespace and
> mount namespace separate from M1.
>
> P1 sends F1 to P2 (using a unix socket). But, P2 is unable to call
> statmount() on F1 because since it is a separate pid and mount
> namespace. This is good and expected.
>
> Case 2: A fd is on a "unmounted" mount (Link to Picture: [6])
> Consider a similar situation as Case 1. But now F1 is on a mounted that
> has been "unmounted". Now, since we used openers credentials to check
> for permissions P2 ends up having the ability call statmount() and get
> mount info for this "unmounted" mount.
>
> Hence, It is better to restrict the ability to call statmount() on fds
> on "unmounted" mounts to system root only (There could also be other
> cases than the one described above).
>
> Changes from v3 [7] to v4:
> * Change the string returned when there is no mountpoint to be
> "[unmounted]" instead of "[detached]".
> * Remove the new DEFINE_FREE put_file and use the one already present in
> include/linux/file.h (fput) [8].
> * Inside listmount consistently pass 0 in flags to copy_mnt_id_req and
> prepare_klistmount()->grab_requested_mnt_ns() and remove flags from the
> prepare_klistmount prototype.
> * If STATMOUNT_BY_FD is set, check for mnt_ns_id == 0 && mnt_id == 0.
>
> Changes from v2 [9] to v3:
> * Rename STATMOUNT_FD flag to STATMOUNT_BY_FD.
> * Fixed UAF bug caused by the reference to fd_mount being bound by scope
> of CLASS(fd_raw, f)(kreq.fd) by using fget_raw instead.
> * Reused @spare parameter in mnt_id_req instead of adding new fields to
> the struct.
>
> Changes from v1 [10] to v2:
> v1 of this patchset, took a different approach and introduced a new
> umount_mnt_ns, to which "unmounted" mounts would be moved to (instead of
> their namespace being NULL) thus allowing them to be still available via
> statmount.
>
> Introducing umount_mnt_ns complicated namespace locking and modified
> performance sensitive code [11] and it was agreed upon that fd-based
> statmount would be better.
>
> This code is also available on github [12].
>
> [1]: https://github.com/checkpoint-restore/criu/pull/2754
> [2]: https://lore.kernel.org/all/20251118084836.2114503-1-b.sachdev1904@gmail.com/
> [3]: https://lore.kernel.org/criu/20251109053921.1320977-2-b.sachdev1904@gmail.com/T/#u
> [4]: https://lore.kernel.org/all/20251029052037.506273-2-b.sachdev1904@gmail.com/
> [5]: https://github.com/bsach64/linux/blob/statmount-fd-v5/fd_on_normal_mount.png
> [6]: https://github.com/bsach64/linux/blob/statmount-fd-v5/file_on_unmounted_mount.png
> [7]: https://lore.kernel.org/all/20251024181443.786363-1-b.sachdev1904@gmail.com/
> [8]: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/include/linux/file.h#n97
> [9]: https://lore.kernel.org/linux-fsdevel/20251011124753.1820802-1-b.sachdev1904@gmail.com/
> [10]: https://lore.kernel.org/linux-fsdevel/20251002125422.203598-1-b.sachdev1904@gmail.com/
> [11]: https://lore.kernel.org/linux-fsdevel/7e4d9eb5-6dde-4c59-8ee3-358233f082d0@virtuozzo.com/
> [12]: https://github.com/bsach64/linux/tree/statmount-fd-v7
>
> Bhavik Sachdev (3):
> statmount: permission check should return EPERM
> statmount: accept fd as a parameter
> selftests: statmount: tests for STATMOUNT_BY_FD
>
> fs/namespace.c | 102 ++++---
> include/uapi/linux/mount.h | 10 +-
> .../filesystems/statmount/statmount.h | 15 +-
> .../filesystems/statmount/statmount_test.c | 261 +++++++++++++++++-
> .../filesystems/statmount/statmount_test_ns.c | 101 ++++++-
> 5 files changed, 430 insertions(+), 59 deletions(-)
This looks useful.
Reviewed-by: Jeff Layton <jlayton@...nel.org>
Powered by blists - more mailing lists