lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAGudoHFA04dBDDP9bOD9kg2zW46ufJ8aBXjzM+gv5MU-gTVm2Q@mail.gmail.com>
Date: Mon, 10 Nov 2025 13:42:23 +0100
From: Mateusz Guzik <mjguzik@...il.com>
To: Jan Kara <jack@...e.cz>
Cc: brauner@...nel.org, viro@...iv.linux.org.uk, linux-kernel@...r.kernel.org, 
	linux-fsdevel@...r.kernel.org, linux-ext4@...r.kernel.org, tytso@....edu, 
	torvalds@...ux-foundation.org, josef@...icpanda.com, 
	linux-btrfs@...r.kernel.org
Subject: Re: [PATCH v3 1/3] fs: speed up path lookup with cheaper handling of MAY_EXEC

On Mon, Nov 10, 2025 at 11:13 AM Jan Kara <jack@...e.cz> wrote:
> OK, the path lookup is really light

I would not go that far ;)

The current code has function calls which can be either inlined or elided.

More importantly it is a massive branch-fest, notably with repeated
LOOKUP_RCU checks.

Based on my work on the same stuff $elsewhere, most of the time the
entry in the cache is there and is a directory you can traverse
through and which is not mounted on.

While there is a bunch of likely/unlikely usage to help out, the code
is not structured in a way which allows for easy use of it. Instead
some of the branches are repeated or have to be present to begin with.

Ideally lookup could roll forward over a pathname without function
calls as long as fast path conditions hold. You would still need to
pay to check permissions and that this is a non-mounted directory for
every path component, but some of this can be combined. Per the above,
the repeated LOOKUP_RCU checks would be whacked. Checking if this is a
directory which got mounted on *OR* is it a symlink could be one
branch and so on.

On path parsing side, userspace could have passed something fucky like
foo/////bar and this of course needs to be handled but it does not
require the current ugliness to do so. This does happen with real
programs (typically two slashes in a row), but is also constitutes a
small minority of paths. The current code makes sure to skip the
spurious slashes before looking up the name.

My code $elsewhere instead notes it is an invariant that a name
containing a slash cannot appear in the cache so it just goes forward
with the lookup. If an entry is found, the name could not have started
with / and the check is elided (common case). Should the entry be
missing then indeed we check if slashes need to get rolled over.

And so on.

I think I can incrementally reduce a bunch of overhead, but it will
always be leaving some perf on the table unless restructured.

As for some profiling of the state, I booted up a kernel with all of
my patches (including an extra to elide security_inode_permission) +
sheaves and perf top'ed over a testcase which consists of series of
access(2) calls lifted from strace on gcc and the linker. To the tune
of 205 paths, some of them repeated and several deranged -- for
example:
        access("/usr/lib/gcc/x86_64-linux-gnu/12/../../../../x86_64-linux-gnu/lib/x86_64-linux-gnu/12/Scrt1.o",
R_OK);
        access("/usr/lib/gcc/x86_64-linux-gnu/12/../../../../x86_64-linux-gnu/lib/x86_64-linux-gnu/Scrt1.o",
R_OK);
        access("/usr/lib/gcc/x86_64-linux-gnu/12/../../../../x86_64-linux-gnu/lib/../lib/Scrt1.o",
R_OK);

The file is attached for interested.

The profile:
  20.43%  [kernel]                  [k] __d_lookup_rcu
  10.66%  [kernel]                  [k] entry_SYSCALL_64
   9.50%  [kernel]                  [k] link_path_walk
   6.98%  libc.so.6                 [.] __GI___access
   6.04%  [kernel]                  [k] strncpy_from_user
   4.81%  [kernel]                  [k] step_into
   3.36%  [kernel]                  [k] kmem_cache_alloc_noprof
   2.80%  [kernel]                  [k] kmem_cache_free
   2.77%  [kernel]                  [k] walk_component
   2.18%  [kernel]                  [k] lookup_fast
   1.83%  [kernel]                  [k] set_root
   1.83%  [kernel]                  [k] do_syscall_64
   1.65%  [kernel]                  [k] getname_flags.part.0
   1.57%  [kernel]                  [k] entry_SYSCALL_64_safe_stack
   1.52%  [kernel]                  [k] nd_jump_root
   1.48%  [kernel]                  [k] filename_lookup
   1.34%  [kernel]                  [k] path_init
   1.33%  [kernel]                  [k] do_faccessat
   1.23%  [kernel]                  [k] __legitimize_mnt
   1.23%  [kernel]                  [k] lockref_get_not_dead
   0.96%  [kernel]                  [k] path_lookupat
   0.92%  [kernel]                  [k] lockref_put_return
   0.86%  [kernel]                  [k] its_return_thunk
   0.83%  [kernel]                  [k] entry_SYSCALL_64_after_hwframe
   0.80%  [kernel]                  [k] map_id_range_down
   0.68%  [kernel]                  [k] user_path_at

View attachment "access_compile.c" of type "text/x-csrc" (12408 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ