[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAGudoHFA04dBDDP9bOD9kg2zW46ufJ8aBXjzM+gv5MU-gTVm2Q@mail.gmail.com>
Date: Mon, 10 Nov 2025 13:42:23 +0100
From: Mateusz Guzik <mjguzik@...il.com>
To: Jan Kara <jack@...e.cz>
Cc: brauner@...nel.org, viro@...iv.linux.org.uk, linux-kernel@...r.kernel.org,
linux-fsdevel@...r.kernel.org, linux-ext4@...r.kernel.org, tytso@....edu,
torvalds@...ux-foundation.org, josef@...icpanda.com,
linux-btrfs@...r.kernel.org
Subject: Re: [PATCH v3 1/3] fs: speed up path lookup with cheaper handling of MAY_EXEC
On Mon, Nov 10, 2025 at 11:13 AM Jan Kara <jack@...e.cz> wrote:
> OK, the path lookup is really light
I would not go that far ;)
The current code has function calls which can be either inlined or elided.
More importantly it is a massive branch-fest, notably with repeated
LOOKUP_RCU checks.
Based on my work on the same stuff $elsewhere, most of the time the
entry in the cache is there and is a directory you can traverse
through and which is not mounted on.
While there is a bunch of likely/unlikely usage to help out, the code
is not structured in a way which allows for easy use of it. Instead
some of the branches are repeated or have to be present to begin with.
Ideally lookup could roll forward over a pathname without function
calls as long as fast path conditions hold. You would still need to
pay to check permissions and that this is a non-mounted directory for
every path component, but some of this can be combined. Per the above,
the repeated LOOKUP_RCU checks would be whacked. Checking if this is a
directory which got mounted on *OR* is it a symlink could be one
branch and so on.
On path parsing side, userspace could have passed something fucky like
foo/////bar and this of course needs to be handled but it does not
require the current ugliness to do so. This does happen with real
programs (typically two slashes in a row), but is also constitutes a
small minority of paths. The current code makes sure to skip the
spurious slashes before looking up the name.
My code $elsewhere instead notes it is an invariant that a name
containing a slash cannot appear in the cache so it just goes forward
with the lookup. If an entry is found, the name could not have started
with / and the check is elided (common case). Should the entry be
missing then indeed we check if slashes need to get rolled over.
And so on.
I think I can incrementally reduce a bunch of overhead, but it will
always be leaving some perf on the table unless restructured.
As for some profiling of the state, I booted up a kernel with all of
my patches (including an extra to elide security_inode_permission) +
sheaves and perf top'ed over a testcase which consists of series of
access(2) calls lifted from strace on gcc and the linker. To the tune
of 205 paths, some of them repeated and several deranged -- for
example:
access("/usr/lib/gcc/x86_64-linux-gnu/12/../../../../x86_64-linux-gnu/lib/x86_64-linux-gnu/12/Scrt1.o",
R_OK);
access("/usr/lib/gcc/x86_64-linux-gnu/12/../../../../x86_64-linux-gnu/lib/x86_64-linux-gnu/Scrt1.o",
R_OK);
access("/usr/lib/gcc/x86_64-linux-gnu/12/../../../../x86_64-linux-gnu/lib/../lib/Scrt1.o",
R_OK);
The file is attached for interested.
The profile:
20.43% [kernel] [k] __d_lookup_rcu
10.66% [kernel] [k] entry_SYSCALL_64
9.50% [kernel] [k] link_path_walk
6.98% libc.so.6 [.] __GI___access
6.04% [kernel] [k] strncpy_from_user
4.81% [kernel] [k] step_into
3.36% [kernel] [k] kmem_cache_alloc_noprof
2.80% [kernel] [k] kmem_cache_free
2.77% [kernel] [k] walk_component
2.18% [kernel] [k] lookup_fast
1.83% [kernel] [k] set_root
1.83% [kernel] [k] do_syscall_64
1.65% [kernel] [k] getname_flags.part.0
1.57% [kernel] [k] entry_SYSCALL_64_safe_stack
1.52% [kernel] [k] nd_jump_root
1.48% [kernel] [k] filename_lookup
1.34% [kernel] [k] path_init
1.33% [kernel] [k] do_faccessat
1.23% [kernel] [k] __legitimize_mnt
1.23% [kernel] [k] lockref_get_not_dead
0.96% [kernel] [k] path_lookupat
0.92% [kernel] [k] lockref_put_return
0.86% [kernel] [k] its_return_thunk
0.83% [kernel] [k] entry_SYSCALL_64_after_hwframe
0.80% [kernel] [k] map_id_range_down
0.68% [kernel] [k] user_path_at
View attachment "access_compile.c" of type "text/x-csrc" (12408 bytes)
Powered by blists - more mailing lists