[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250523-vfs-misc-bd367f758841@brauner>
Date: Fri, 23 May 2025 14:40:22 +0200
From: Christian Brauner <brauner@...nel.org>
To: Linus Torvalds <torvalds@...ux-foundation.org>
Cc: Christian Brauner <brauner@...nel.org>,
linux-fsdevel@...r.kernel.org,
linux-kernel@...r.kernel.org
Subject: [GIT PULL for v6.16] vfs misc
Hey Linus,
/* Summary */
This contains the usual selections of misc updates for this cycle.
Features:
- Use folios for symlinks in the page cache
FUSE already uses folios for its symlinks. Mirror that conversion in
the generic code and the NFS code. That lets us get rid of a few
folio->page->folio conversions in this path, and some of the few
remaining users of read_cache_page() / read_mapping_page().
- Try and make a few filesystem operations killable on the VFS
inode->i_mutex level.
- Add sysctl vfs_cache_pressure_denom for bulk file operations
Some workloads need to preserve more dentries than we currently allow
through out sysctl interface.
A HDFS servers with 12 HDDs per server, on a HDFS datanode startup
involves scanning all files and caching their metadata (including
dentries and inodes) in memory. Each HDD contains approximately 2
million files, resulting in a total of ~20 million cached dentries
after initialization.
To minimize dentry reclamation, they set vfs_cache_pressure to 1.
Despite this configuration, memory pressure conditions can still
trigger reclamation of up to 50% of cached dentries, reducing the
cache from 20 million to approximately 10 million entries. During the
subsequent cache rebuild period, any HDFS datanode restart operation
incurs substantial latency penalties until full cache recovery
completes.
To maintain service stability, more dentries need to be preserved
during memory reclamation. The current minimum reclaim ratio (1/100 of
total dentries) remains too aggressive for such workload. This patch
introduces vfs_cache_pressure_denom for more granular cache pressure
control. The configuration [vfs_cache_pressure=1,
vfs_cache_pressure_denom=10000] effectively maintains the full 20
million dentry cache under memory pressure, preventing datanode
restart performance degradation.
- Avoid some jumps in inode_permission() using likely()/unlikely().
- Avid a memory access which is most likely a cache miss when descending
into devcgroup_inode_permission().
- Add fastpath predicts for stat() and fdput().
- Anonymous inodes currently don't come with a proper mode causing
issues in the kernel when we want to add useful VFS debug assert. Fix
that by giving them a proper mode and masking it off when we report it
to userspace which relies on them not having any mode.
- Anonymous inodes currently allow to change inode attributes because
the VFS falls back to simple_setattr() if i_op->setattr isn't
implemented. This means the ownership and mode for every single user
of anon_inode_inode can be changed. Block that as it's either useless
or actively harmful. If specific ownership is needed the respective
subsystem should allocate anonymous inodes from their own private
superblock.
- Raise SB_I_NODEV and SB_I_NOEXEC on the anonymous inode superblock.
- Add proper tests for anonymous inode behavior.
- Make it easy to detect proper anonymous inodes and to ensure that we
can detect them in codepaths such as readahead().
Cleanups:
- Port pidfs to the new anon_inode_{g,s}etattr() helpers.
- Try to remove the uselib() system call.
- Add unlikely branch hint return path for poll.
- Add unlikely branch hint on return path for core_sys_select.
- Don't allow signals to interrupt getdents copying for fuse.
- Provide a size hint to dir_context for during readdir().
- Use writeback_iter directly in mpage_writepages.
- Update compression and mtime descriptions in initramfs documentation.
- Update main netfs API document.
- Remove useless plus one in super_cache_scan().
- Remove unnecessary NULL-check guards during setns().
- Add separate separate {get,put}_cgroup_ns no-op cases.
Fixes:
- Fix typo in root= kernel parameter description.
- Use KERN_INFO for infof()|info_plog()|infofc().
- Correct comments of fs_validate_description()
- Mark an unlikely if condition with unlikely() in vfs_parse_monolithic_sep().
- Delete macro fsparam_u32hex()
- Remove unused and problematic validate_constant_table().
- Fix potential unsigned integer underflow in fs_name().
- Make file-nr output the total allocated file handles.
/* Testing */
gcc (Debian 14.2.0-19) 14.2.0
Debian clang version 19.1.7 (3)
No build failures or warnings were observed.
/* Conflicts */
Merge conflicts with mainline
=============================
No known conflicts.
Merge conflicts with other trees
================================
This will have a merge conflict with the vfs freeze pull request sent as:
https://lore.kernel.org/20250523-vfs-freeze-8e3934479cba@brauner
that can be resolved as follows:
diff --cc fs/internal.h
index 8800e1bb23e3,f545400ce607..000000000000
--- a/fs/internal.h
+++ b/fs/internal.h
@@@ -344,4 -343,8 +344,9 @@@ static inline bool path_mounted(const s
void file_f_owner_release(struct file *file);
bool file_seek_cur_needs_f_lock(struct file *file);
int statmount_mnt_idmap(struct mnt_idmap *idmap, struct seq_file *seq, bool uid_map);
+struct dentry *find_next_child(struct dentry *parent, struct dentry *prev);
+ int anon_inode_getattr(struct mnt_idmap *idmap, const struct path *path,
+ struct kstat *stat, u32 request_mask,
+ unsigned int query_flags);
+ int anon_inode_setattr(struct mnt_idmap *idmap, struct dentry *dentry,
+ struct iattr *attr);
The following changes since commit 0af2f6be1b4281385b618cb86ad946eded089ac8:
Linux 6.15-rc1 (2025-04-06 13:11:33 -0700)
are available in the Git repository at:
git@...olite.kernel.org:pub/scm/linux/kernel/git/vfs/vfs tags/vfs-6.16-rc1.misc
for you to fetch changes up to 76145cb37ff0636fdf2a15320b2c2421915df32b:
Merge patch series "Use folios for symlinks in the page cache" (2025-05-15 12:14:34 +0200)
Please consider pulling these changes from the signed vfs-6.16-rc1.misc tag.
Thanks!
Christian
----------------------------------------------------------------
vfs-6.16-rc1.misc
----------------------------------------------------------------
Christian Brauner (17):
anon_inode: use a proper mode internally
pidfs: use anon_inode_getattr()
anon_inode: explicitly block ->setattr()
pidfs: use anon_inode_setattr()
anon_inode: raise SB_I_NODEV and SB_I_NOEXEC
selftests/filesystems: add chown() test for anonymous inodes
selftests/filesystems: add chmod() test for anonymous inodes
selftests/filesystems: add exec() test for anonymous inodes
selftests/filesystems: add open() test for anonymous inodes
Merge patch series "fs: harden anon inodes"
Merge patch series "fs: sort out cosmetic differences between stat funcs and add predicts"
fs: remove uselib() system call
Merge patch series "two nits for path lookup"
fs: add S_ANON_INODE
Merge patch series "Minor namespace code simplication"
Merge patch series "include/linux/fs.h: add inode_lock_killable()"
Merge patch series "Use folios for symlinks in the page cache"
David Disseldorp (1):
docs: initramfs: update compression and mtime descriptions
David Howells (1):
netfs: Update main API document
Jinliang Zheng (1):
fs: remove useless plus one in super_cache_scan()
Joel Savitz (2):
kernel/nsproxy: remove unnecessary guards
include/cgroup: separate {get,put}_cgroup_ns no-op case
Li RongQing (1):
fs: Make file-nr output the total allocated file handles
Mateusz Guzik (6):
fs: sort out cosmetic differences between stat funcs and add predicts
fs: predict not having to do anything in fdput()
fs: unconditionally use atime_needs_update() in pick_link()
fs: improve codegen in link_path_walk()
fs: touch up predicts in inode_permission()
device_cgroup: avoid access to ->i_rdev in the common case in devcgroup_inode_permission()
Matthew Wilcox (Oracle) (3):
fs: Convert __page_get_link() to use a folio
nfs: Use a folio in nfs_get_link()
fs: Pass a folio to page_put_link()
Max Kellermann (4):
include/linux/fs.h: add inode_lock_killable()
fs/open: make chmod_common() and chown_common() killable
fs/open: make do_truncate() killable
fs/read_write: make default_llseek() killable
Miklos Szeredi (2):
fuse: don't allow signals to interrupt getdents copying
readdir: supply dir_context.count as readdir buffer size hint
Petr Vaněk (1):
Documentation: fix typo in root= kernel parameter description
Yafang Shao (1):
vfs: Add sysctl vfs_cache_pressure_denom for bulk file operations
Zijun Hu (6):
fs/fs_context: Use KERN_INFO for infof()|info_plog()|infofc()
fs/fs_parse: Correct comments of fs_validate_description()
fs/fs_context: Mark an unlikely if condition with unlikely() in vfs_parse_monolithic_sep()
fs/filesystems: Fix potential unsigned integer underflow in fs_name()
fs/fs_parse: Delete macro fsparam_u32hex()
fs/fs_parse: Remove unused and problematic validate_constant_table()
Documentation/admin-guide/kernel-parameters.txt | 2 +-
Documentation/admin-guide/sysctl/vm.rst | 32 +-
.../driver-api/early-userspace/buffer-format.rst | 34 +-
Documentation/filesystems/mount_api.rst | 16 -
Documentation/filesystems/netfs_library.rst | 1016 ++++++++++++++------
arch/m68k/configs/amcore_defconfig | 1 -
arch/x86/configs/i386_defconfig | 1 -
arch/xtensa/configs/cadence_csp_defconfig | 1 -
fs/anon_inodes.c | 45 +
fs/binfmt_elf.c | 76 --
fs/dcache.c | 11 +-
fs/exec.c | 60 --
fs/exportfs/expfs.c | 1 +
fs/file_table.c | 2 +-
fs/filesystems.c | 14 +-
fs/fs_context.c | 6 +-
fs/fs_parser.c | 55 +-
fs/fuse/dir.c | 2 +-
fs/fuse/readdir.c | 4 +-
fs/internal.h | 5 +
fs/ioctl.c | 7 +-
fs/libfs.c | 10 +-
fs/mpage.c | 13 +-
fs/namei.c | 79 +-
fs/nfs/symlink.c | 20 +-
fs/open.c | 14 +-
fs/overlayfs/readdir.c | 12 +-
fs/pidfs.c | 28 +-
fs/read_write.c | 4 +-
fs/readdir.c | 47 +-
fs/select.c | 4 +-
fs/stat.c | 35 +-
fs/super.c | 2 +-
include/linux/binfmts.h | 1 -
include/linux/cgroup.h | 26 +-
include/linux/device_cgroup.h | 7 +-
include/linux/file.h | 2 +-
include/linux/fs.h | 22 +
include/linux/fs_parser.h | 7 -
init/Kconfig | 10 -
kernel/nsproxy.c | 30 +-
mm/readahead.c | 20 +-
tools/testing/selftests/bpf/config.aarch64 | 1 -
tools/testing/selftests/bpf/config.s390x | 1 -
tools/testing/selftests/filesystems/.gitignore | 1 +
tools/testing/selftests/filesystems/Makefile | 2 +-
.../selftests/filesystems/anon_inode_test.c | 69 ++
47 files changed, 1164 insertions(+), 694 deletions(-)
create mode 100644 tools/testing/selftests/filesystems/anon_inode_test.c
Powered by blists - more mailing lists