[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250118-vfs-pidfs-5921bfa5632a@brauner>
Date: Sat, 18 Jan 2025 14:00:30 +0100
From: Christian Brauner <brauner@...nel.org>
To: Linus Torvalds <torvalds@...ux-foundation.org>
Cc: Christian Brauner <brauner@...nel.org>,
linux-fsdevel@...r.kernel.org,
linux-kernel@...r.kernel.org
Subject: [GIT PULL] vfs pidfs
Hey Linus,
/* Summary */
This contains pidfs updates for this cycle:
- Rework inode number allocation
Recently we received a patchset that aims to enable file handle
encoding and decoding via name_to_handle_at(2) and
open_by_handle_at(2).
A crucical step in the patch series is how to go from inode number to
struct pid without leaking information into unprivileged contexts. The
issue is that in order to find a struct pid the pid number in the
initial pid namespace must be encoded into the file handle via
name_to_handle_at(2).
This can be used by containers using a separate pid namespace to learn
what the pid number of a given process in the initial pid namespace
is. While this is a weak information leak it could be used in various
exploits and in general is an ugly wart in the design.
To solve this problem a new way is needed to lookup a struct pid based
on the inode number allocated for that struct pid. The other part is
to remove the custom inode number allocation on 32bit systems that is
also an ugly wart that should go away.
Allocate unique identifiers for struct pid by simply incrementing a 64
bit counter and insert each struct pid into the rbtree so it can be
looked up to decode file handles avoiding to leak actual pids across
pid namespaces in file handles.
On both 64 bit and 32 bit the same 64 bit identifier is used to lookup
struct pid in the rbtree. On 64 bit the unique identifier for struct pid
simply becomes the inode number. Comparing two pidfds continues to be as
simple as comparing inode numbers.
On 32 bit the 64 bit number assigned to struct pid is split into two 32
bit numbers. The lower 32 bits are used as the inode number and the
upper 32 bits are used as the inode generation number. Whenever a
wraparound happens on 32 bit the 64 bit number will be incremented by 2
so inode numbering starts at 2 again.
When a wraparound happens on 32 bit multiple pidfds with the same inode
number are likely to exist. This isn't a problem since before pidfs
pidfds used the anonymous inode meaning all pidfds had the same inode
number. On 32 bit sserspace can thus reconstruct the 64 bit identifier
by retrieving both the inode number and the inode generation number to
compare, or use file handles. This gives the same guarantees on both 32
bit and 64 bit.
- Implement file handle support
This is based on custom export operation methods which allows pidfs to
implement permission checking and opening of pidfs file handles
cleanly without hacking around in the core file handle code too much.
- Support bind-mounts
Allow bind-mounting pidfds. Similar to nsfs let's allow bind-mounts
for pidfds. This allows pidfds to be safely recovered and checked for
process recycling.
Instead of checking d_ops for both nsfs and pidfs we could in a
follow-up patch add a flag argument to struct dentry_operations that
functions similar to file_operations->fop_flags.
/* Testing */
gcc version 14.2.0 (Debian 14.2.0-6)
Debian clang version 16.0.6 (27+b1)
No build failures or warnings were observed.
/* Conflicts */
Merge conflicts with mainline
=============================
No known conflicts.
Merge conflicts with other trees
================================
No known conflicts.
The following changes since commit 40384c840ea1944d7c5a392e8975ed088ecf0b37:
Linux 6.13-rc1 (2024-12-01 14:28:56 -0800)
are available in the Git repository at:
git@...olite.kernel.org:pub/scm/linux/kernel/git/vfs/vfs tags/vfs-6.14-rc1.pidfs
for you to fetch changes up to 3781680fba3eab0b34b071cb9443fd5ad92d23cf:
Merge patch series "pidfs: support bind-mounts" (2024-12-22 11:03:19 +0100)
Please consider pulling these changes from the signed vfs-6.14-rc1.pidfs tag.
Thanks!
Christian
----------------------------------------------------------------
vfs-6.14-rc1.pidfs
----------------------------------------------------------------
Christian Brauner (16):
pidfs: rework inode number allocation
pidfs: remove 32bit inode number handling
pidfs: support FS_IOC_GETVERSION
Merge patch series "pidfs: file handle preliminaries"
fhandle: simplify error handling
exportfs: add open method
fhandle: pull CAP_DAC_READ_SEARCH check into may_decode_fh()
exportfs: add permission method
pidfs: implement file handle support
Merge patch series "pidfs: implement file handle support"
pidfs: check for valid ioctl commands
selftests/pidfd: add pidfs file handle selftests
pidfs: lookup pid through rbtree
pidfs: allow bind-mounts
selftests: add pidfd bind-mount tests
Merge patch series "pidfs: support bind-mounts"
Erin Shepherd (1):
pseudofs: add support for export_ops
fs/fhandle.c | 115 +++--
fs/libfs.c | 1 +
fs/namespace.c | 10 +-
fs/pidfs.c | 298 ++++++++++--
include/linux/exportfs.h | 20 +
include/linux/pid.h | 2 +
include/linux/pidfs.h | 3 +
include/linux/pseudo_fs.h | 1 +
kernel/pid.c | 14 +-
tools/testing/selftests/pidfd/.gitignore | 2 +
tools/testing/selftests/pidfd/Makefile | 3 +-
tools/testing/selftests/pidfd/pidfd.h | 39 ++
tools/testing/selftests/pidfd/pidfd_bind_mount.c | 188 ++++++++
.../selftests/pidfd/pidfd_file_handle_test.c | 503 +++++++++++++++++++++
tools/testing/selftests/pidfd/pidfd_setns_test.c | 47 +-
tools/testing/selftests/pidfd/pidfd_wait.c | 47 +-
16 files changed, 1110 insertions(+), 183 deletions(-)
create mode 100644 tools/testing/selftests/pidfd/pidfd_bind_mount.c
create mode 100644 tools/testing/selftests/pidfd/pidfd_file_handle_test.c
Powered by blists - more mailing lists