[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <45e2da5392c07cfc139a014fbac512bfe14113a7.camel@kernel.org>
Date: Tue, 12 Nov 2024 08:57:20 -0500
From: Jeff Layton <jlayton@...nel.org>
To: Christian Brauner <brauner@...nel.org>, Erin Shepherd
<erin.shepherd@....eu>, Amir Goldstein <amir73il@...il.com>
Cc: linux-kernel@...r.kernel.org, linux-fsdevel@...r.kernel.org,
christian@...uner.io, paul@...l-moore.com, bluca@...ian.org, Chuck Lever
<chuck.lever@...cle.com>
Subject: Re: [PATCH 0/4] pidfs: implement file handle support
On Tue, 2024-11-12 at 14:10 +0100, Christian Brauner wrote:
> On Fri, Nov 01, 2024 at 01:54:48PM +0000, Erin Shepherd wrote:
> > Since the introduction of pidfs, we have had 64-bit process identifiers
> > that will not be reused for the entire uptime of the system. This greatly
> > facilitates process tracking in userspace.
> >
> > There are two limitations at present:
> >
> > * These identifiers are currently only exposed to processes on 64-bit
> > systems. On 32-bit systems, inode space is also limited to 32 bits and
> > therefore is subject to the same reuse issues.
We should really just move to storing 64-bit inode numbers internally
on 32-bit machines. That would at least make statx() give you all 64
bits on 32-bit host.
> > * There is no way to go from one of these unique identifiers to a pid or
> > pidfd.
> >
> > Patch 1 & 2 in this stack implements fh_export for pidfs. This means
> > userspace can retrieve a unique process identifier even on 32-bit systems
> > via name_to_handle_at.
> >
> > Patch 3 & 4 in this stack implement fh_to_dentry for pidfs. This means
> > userspace can convert back from a file handle to the corresponding pidfd.
> > To support us going from a file handle to a pidfd, we have to store a pid
> > inside the file handle. To ensure file handles are invariant and can move
> > between pid namespaces, we stash a pid from the initial namespace inside
> > the file handle.
> >
> > I'm not quite sure if stashing an initial-namespace pid inside the file
> > handle is the right approach here; if not, I think that patch 1 & 2 are
> > useful on their own.
Hmm... I guess pid namespaces don't have a convenient 64-bit ID like
mount namespaces do? In that case, stashing the pid from init_ns is
probably the next best thing.
>
> Sorry for the delayed reply (I'm recovering from a lengthy illness.).
>
> I like the idea in general. I think this is really useful. A few of my
> thoughts but I need input from Amir and Jeff:
>
> * In the last patch of the series you already implement decoding of
> pidfd file handles by adding a .fh_to_dentry export_operations method.
>
> There are a few things to consider because of how open_by_handle_at()
> works.
>
> - open_by_handle_at() needs to be restricted so it only creates pidfds
> from pidfs file handles that resolve to a struct pid that is
> reachable in the caller's pid namespace. In other words, it should
> mirror pidfd_open().
>
> Put another way, open_by_handle_at() must not be usable to open
> arbitrary pids to prevent a container from constructing a pidfd file
> handle for a process that lives outside it's pid namespace
> hierarchy.
>
> With this restriction in place open_by_handle_at() can be available
> to let unprivileged processes open pidfd file handles.
>
> Related to that, I don't think we need to make open_by_handle_at()
> open arbitrary pidfd file handles via CAP_DAC_READ_SEARCH. Simply
> because any process in the initial pid namespace can open any other
> process via pidfd_open() anyway because pid namespaces are
> hierarchical.
>
> IOW, CAP_DAC_READ_SEARCH must not override the restriction that the
> provided pidfs file handle must be reachable from the caller's pid
> namespace.
>
> - open_by_handle_at() uses may_decode_fh() to determine whether it's
> possible to decode a file handle as an unprivileged user. The
> current checks don't make sense for pidfs. Conceptually, I think
> there don't need to place any restrictions based on global
> CAP_DAC_READ_SEARCH, owning user namespace of the superblock or
> mount on pidfs file handles.
>
> The only restriction that matters is that the requested pidfs file
> handle is reachable from the caller's pid namespace.
>
> - A pidfd always has exactly a single inode and a single dentry.
> There's no aliases.
>
> - Generally, in my naive opinion, I think that decoding pidfs file
> handles should be a lot simpler than decoding regular path based
> file handles. Because there should be no need to verify any
> ancestors, or reconnect paths. Pidfs also doesn't have directory
> inodes, only regular inodes. In other words, any dentry is
> acceptable.
>
> Essentially, the only thing we need is for exportfs_decode_fh_raw()
> to verify that the provided pidfs file handle is resolvable in the
> caller's pid namespace. If so we're done. The challenge is how to
> nicely plumb this into the code without it sticking out like a sore
> thumb.
>
> - Pidfs should not be exportable via NFS. It doesn't make sense.
I haven't looked over the patchset yet, but those restrictions all
sound pretty reasonable to me. Special casing the may_decode_fh
permission checks may be the tricky bit. I'm not sure what that should
look like, tbqh.
--
Jeff Layton <jlayton@...nel.org>
Powered by blists - more mailing lists