lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250523-vfs-misc-bd367f758841@brauner>
Date: Fri, 23 May 2025 14:40:22 +0200
From: Christian Brauner <brauner@...nel.org>
To: Linus Torvalds <torvalds@...ux-foundation.org>
Cc: Christian Brauner <brauner@...nel.org>,
	linux-fsdevel@...r.kernel.org,
	linux-kernel@...r.kernel.org
Subject: [GIT PULL for v6.16] vfs misc

Hey Linus,

/* Summary */

This contains the usual selections of misc updates for this cycle.

Features:

- Use folios for symlinks in the page cache

  FUSE already uses folios for its symlinks. Mirror that conversion in
  the generic code and the NFS code. That lets us get rid of a few
  folio->page->folio conversions in this path, and some of the few
  remaining users of read_cache_page() / read_mapping_page().

- Try and make a few filesystem operations killable on the VFS
  inode->i_mutex level.

- Add sysctl vfs_cache_pressure_denom for bulk file operations

  Some workloads need to preserve more dentries than we currently allow
  through out sysctl interface.

  A HDFS servers with 12 HDDs per server, on a HDFS datanode startup
  involves scanning all files and caching their metadata (including
  dentries and inodes) in memory. Each HDD contains approximately 2
  million files, resulting in a total of ~20 million cached dentries
  after initialization.

  To minimize dentry reclamation, they set vfs_cache_pressure to 1.
  Despite this configuration, memory pressure conditions can still
  trigger reclamation of up to 50% of cached dentries, reducing the
  cache from 20 million to approximately 10 million entries. During the
  subsequent cache rebuild period, any HDFS datanode restart operation
  incurs substantial latency penalties until full cache recovery
  completes.

  To maintain service stability, more dentries need to be preserved
  during memory reclamation. The current minimum reclaim ratio (1/100 of
  total dentries) remains too aggressive for such workload. This patch
  introduces vfs_cache_pressure_denom for more granular cache pressure
  control. The configuration [vfs_cache_pressure=1,
  vfs_cache_pressure_denom=10000] effectively maintains the full 20
  million dentry cache under memory pressure, preventing datanode
  restart performance degradation.

- Avoid some jumps in inode_permission() using likely()/unlikely().

- Avid a memory access which is most likely a cache miss when descending
  into devcgroup_inode_permission().

- Add fastpath predicts for stat() and fdput().

- Anonymous inodes currently don't come with a proper mode causing
  issues in the kernel when we want to add useful VFS debug assert. Fix
  that by giving them a proper mode and masking it off when we report it
  to userspace which relies on them not having any mode.

- Anonymous inodes currently allow to change inode attributes because
  the VFS falls back to simple_setattr() if i_op->setattr isn't
  implemented. This means the ownership and mode for every single user
  of anon_inode_inode can be changed. Block that as it's either useless
  or actively harmful. If specific ownership is needed the respective
  subsystem should allocate anonymous inodes from their own private
  superblock.

- Raise SB_I_NODEV and SB_I_NOEXEC on the anonymous inode superblock.

- Add proper tests for anonymous inode behavior.

- Make it easy to detect proper anonymous inodes and to ensure that we
  can detect them in codepaths such as readahead().

Cleanups:

- Port pidfs to the new anon_inode_{g,s}etattr() helpers.

- Try to remove the uselib() system call.

- Add unlikely branch hint return path for poll.

- Add unlikely branch hint on return path for core_sys_select.

- Don't allow signals to interrupt getdents copying for fuse.

- Provide a size hint to dir_context for during readdir().

- Use writeback_iter directly in mpage_writepages.

- Update compression and mtime descriptions in initramfs documentation.

- Update main netfs API document.

- Remove useless plus one in super_cache_scan().

- Remove unnecessary NULL-check guards during setns().

- Add separate separate {get,put}_cgroup_ns no-op cases.

Fixes:

- Fix typo in root= kernel parameter description.

- Use KERN_INFO for infof()|info_plog()|infofc().

- Correct comments of fs_validate_description()

- Mark an unlikely if condition with unlikely() in vfs_parse_monolithic_sep().

- Delete macro fsparam_u32hex()

- Remove unused and problematic validate_constant_table().

- Fix potential unsigned integer underflow in fs_name().

- Make file-nr output the total allocated file handles.

/* Testing */

gcc (Debian 14.2.0-19) 14.2.0
Debian clang version 19.1.7 (3)

No build failures or warnings were observed.

/* Conflicts */

Merge conflicts with mainline
=============================

No known conflicts.

Merge conflicts with other trees
================================

This will have a merge conflict with the vfs freeze pull request sent as:

https://lore.kernel.org/20250523-vfs-freeze-8e3934479cba@brauner

that can be resolved as follows:

diff --cc fs/internal.h
index 8800e1bb23e3,f545400ce607..000000000000
--- a/fs/internal.h
+++ b/fs/internal.h
@@@ -344,4 -343,8 +344,9 @@@ static inline bool path_mounted(const s
  void file_f_owner_release(struct file *file);
  bool file_seek_cur_needs_f_lock(struct file *file);
  int statmount_mnt_idmap(struct mnt_idmap *idmap, struct seq_file *seq, bool uid_map);
 +struct dentry *find_next_child(struct dentry *parent, struct dentry *prev);
+ int anon_inode_getattr(struct mnt_idmap *idmap, const struct path *path,
+                      struct kstat *stat, u32 request_mask,
+                      unsigned int query_flags);
+ int anon_inode_setattr(struct mnt_idmap *idmap, struct dentry *dentry,
+                      struct iattr *attr);

The following changes since commit 0af2f6be1b4281385b618cb86ad946eded089ac8:

  Linux 6.15-rc1 (2025-04-06 13:11:33 -0700)

are available in the Git repository at:

  git@...olite.kernel.org:pub/scm/linux/kernel/git/vfs/vfs tags/vfs-6.16-rc1.misc

for you to fetch changes up to 76145cb37ff0636fdf2a15320b2c2421915df32b:

  Merge patch series "Use folios for symlinks in the page cache" (2025-05-15 12:14:34 +0200)

Please consider pulling these changes from the signed vfs-6.16-rc1.misc tag.

Thanks!
Christian

----------------------------------------------------------------
vfs-6.16-rc1.misc

----------------------------------------------------------------
Christian Brauner (17):
      anon_inode: use a proper mode internally
      pidfs: use anon_inode_getattr()
      anon_inode: explicitly block ->setattr()
      pidfs: use anon_inode_setattr()
      anon_inode: raise SB_I_NODEV and SB_I_NOEXEC
      selftests/filesystems: add chown() test for anonymous inodes
      selftests/filesystems: add chmod() test for anonymous inodes
      selftests/filesystems: add exec() test for anonymous inodes
      selftests/filesystems: add open() test for anonymous inodes
      Merge patch series "fs: harden anon inodes"
      Merge patch series "fs: sort out cosmetic differences between stat funcs and add predicts"
      fs: remove uselib() system call
      Merge patch series "two nits for path lookup"
      fs: add S_ANON_INODE
      Merge patch series "Minor namespace code simplication"
      Merge patch series "include/linux/fs.h: add inode_lock_killable()"
      Merge patch series "Use folios for symlinks in the page cache"

David Disseldorp (1):
      docs: initramfs: update compression and mtime descriptions

David Howells (1):
      netfs: Update main API document

Jinliang Zheng (1):
      fs: remove useless plus one in super_cache_scan()

Joel Savitz (2):
      kernel/nsproxy: remove unnecessary guards
      include/cgroup: separate {get,put}_cgroup_ns no-op case

Li RongQing (1):
      fs: Make file-nr output the total allocated file handles

Mateusz Guzik (6):
      fs: sort out cosmetic differences between stat funcs and add predicts
      fs: predict not having to do anything in fdput()
      fs: unconditionally use atime_needs_update() in pick_link()
      fs: improve codegen in link_path_walk()
      fs: touch up predicts in inode_permission()
      device_cgroup: avoid access to ->i_rdev in the common case in devcgroup_inode_permission()

Matthew Wilcox (Oracle) (3):
      fs: Convert __page_get_link() to use a folio
      nfs: Use a folio in nfs_get_link()
      fs: Pass a folio to page_put_link()

Max Kellermann (4):
      include/linux/fs.h: add inode_lock_killable()
      fs/open: make chmod_common() and chown_common() killable
      fs/open: make do_truncate() killable
      fs/read_write: make default_llseek() killable

Miklos Szeredi (2):
      fuse: don't allow signals to interrupt getdents copying
      readdir: supply dir_context.count as readdir buffer size hint

Petr Vaněk (1):
      Documentation: fix typo in root= kernel parameter description

Yafang Shao (1):
      vfs: Add sysctl vfs_cache_pressure_denom for bulk file operations

Zijun Hu (6):
      fs/fs_context: Use KERN_INFO for infof()|info_plog()|infofc()
      fs/fs_parse: Correct comments of fs_validate_description()
      fs/fs_context: Mark an unlikely if condition with unlikely() in vfs_parse_monolithic_sep()
      fs/filesystems: Fix potential unsigned integer underflow in fs_name()
      fs/fs_parse: Delete macro fsparam_u32hex()
      fs/fs_parse: Remove unused and problematic validate_constant_table()

 Documentation/admin-guide/kernel-parameters.txt    |    2 +-
 Documentation/admin-guide/sysctl/vm.rst            |   32 +-
 .../driver-api/early-userspace/buffer-format.rst   |   34 +-
 Documentation/filesystems/mount_api.rst            |   16 -
 Documentation/filesystems/netfs_library.rst        | 1016 ++++++++++++++------
 arch/m68k/configs/amcore_defconfig                 |    1 -
 arch/x86/configs/i386_defconfig                    |    1 -
 arch/xtensa/configs/cadence_csp_defconfig          |    1 -
 fs/anon_inodes.c                                   |   45 +
 fs/binfmt_elf.c                                    |   76 --
 fs/dcache.c                                        |   11 +-
 fs/exec.c                                          |   60 --
 fs/exportfs/expfs.c                                |    1 +
 fs/file_table.c                                    |    2 +-
 fs/filesystems.c                                   |   14 +-
 fs/fs_context.c                                    |    6 +-
 fs/fs_parser.c                                     |   55 +-
 fs/fuse/dir.c                                      |    2 +-
 fs/fuse/readdir.c                                  |    4 +-
 fs/internal.h                                      |    5 +
 fs/ioctl.c                                         |    7 +-
 fs/libfs.c                                         |   10 +-
 fs/mpage.c                                         |   13 +-
 fs/namei.c                                         |   79 +-
 fs/nfs/symlink.c                                   |   20 +-
 fs/open.c                                          |   14 +-
 fs/overlayfs/readdir.c                             |   12 +-
 fs/pidfs.c                                         |   28 +-
 fs/read_write.c                                    |    4 +-
 fs/readdir.c                                       |   47 +-
 fs/select.c                                        |    4 +-
 fs/stat.c                                          |   35 +-
 fs/super.c                                         |    2 +-
 include/linux/binfmts.h                            |    1 -
 include/linux/cgroup.h                             |   26 +-
 include/linux/device_cgroup.h                      |    7 +-
 include/linux/file.h                               |    2 +-
 include/linux/fs.h                                 |   22 +
 include/linux/fs_parser.h                          |    7 -
 init/Kconfig                                       |   10 -
 kernel/nsproxy.c                                   |   30 +-
 mm/readahead.c                                     |   20 +-
 tools/testing/selftests/bpf/config.aarch64         |    1 -
 tools/testing/selftests/bpf/config.s390x           |    1 -
 tools/testing/selftests/filesystems/.gitignore     |    1 +
 tools/testing/selftests/filesystems/Makefile       |    2 +-
 .../selftests/filesystems/anon_inode_test.c        |   69 ++
 47 files changed, 1164 insertions(+), 694 deletions(-)
 create mode 100644 tools/testing/selftests/filesystems/anon_inode_test.c

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ