lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAGudoHGz6PXi+DLiWjzwLuYq=c+oiA1cWTUt1RmHw5QOt6DAsA@mail.gmail.com>
Date: Mon, 10 Nov 2025 10:46:38 +0100
From: Mateusz Guzik <mjguzik@...il.com>
To: Jan Kara <jack@...e.cz>
Cc: brauner@...nel.org, viro@...iv.linux.org.uk, linux-kernel@...r.kernel.org, 
	linux-fsdevel@...r.kernel.org, linux-ext4@...r.kernel.org, tytso@....edu, 
	torvalds@...ux-foundation.org, josef@...icpanda.com, 
	linux-btrfs@...r.kernel.org
Subject: Re: [PATCH v3 1/3] fs: speed up path lookup with cheaper handling of MAY_EXEC

On Mon, Nov 10, 2025 at 10:32 AM Jan Kara <jack@...e.cz> wrote:
>
> On Fri 07-11-25 15:21:47, Mateusz Guzik wrote:
> > The generic inode_permission() routine does work which is known to be of
> > no significance for lookup. There are checks for MAY_WRITE, while the
> > requested permission is MAY_EXEC. Additionally devcgroup_inode_permission()
> > is called to check for devices, but it is an invariant the inode is a
> > directory.
> >
> > Absent a ->permission func, execution lands in generic_permission()
> > which checks upfront if the requested permission is granted for
> > everyone.
> >
> > We can elide the branches which are guaranteed to be false and cut
> > straight to the check if everyone happens to be allowed MAY_EXEC on the
> > inode (which holds true most of the time).
> >
> > Moreover, filesystems which provide their own ->permission routine can
> > take advantage of the optimization by setting the IOP_FASTPERM_MAY_EXEC
> > flag on their inodes, which they can legitimately do if their MAY_EXEC
> > handling matches generic_permission().
> >
> > As a simple benchmark, as part of compilation gcc issues access(2) on
> > numerous long paths, for example /usr/lib/gcc/x86_64-linux-gnu/12/crtendS.o
> >
> > Issuing access(2) on it in a loop on ext4 on Sapphire Rapids (ops/s):
> > before: 3797556
> > after:  3987789 (+5%)
> >
> > Note: this depends on the not-yet-landed ext4 patch to mark inodes with
> > cache_no_acl()
> >
> > Signed-off-by: Mateusz Guzik <mjguzik@...il.com>
>
> The gain is nice. I'm just wondering where exactly is it coming from? I
> don't see that we'd be saving some memory load or significant amount of
> work. So is it really coming from the more compact code and saved several
> unlikely branches and function calls?
>

That's several branches and 2 function calls per path component on the
way to the terminal inode. In the path at hand, that's 10 function
calls elided.

>                                                                 Honza
>
> > ---
> >  fs/namei.c         | 43 +++++++++++++++++++++++++++++++++++++++++--
> >  include/linux/fs.h | 13 +++++++------
> >  2 files changed, 48 insertions(+), 8 deletions(-)
> >
> > diff --git a/fs/namei.c b/fs/namei.c
> > index a9f9d0453425..6b2a5a5478e7 100644
> > --- a/fs/namei.c
> > +++ b/fs/namei.c
> > @@ -540,6 +540,9 @@ static inline int do_inode_permission(struct mnt_idmap *idmap,
> >   * @mask: Right to check for (%MAY_READ, %MAY_WRITE, %MAY_EXEC)
> >   *
> >   * Separate out file-system wide checks from inode-specific permission checks.
> > + *
> > + * Note: lookup_inode_permission_may_exec() does not call here. If you add
> > + * MAY_EXEC checks, adjust it.
> >   */
> >  static int sb_permission(struct super_block *sb, struct inode *inode, int mask)
> >  {
> > @@ -602,6 +605,42 @@ int inode_permission(struct mnt_idmap *idmap,
> >  }
> >  EXPORT_SYMBOL(inode_permission);
> >
> > +/**
> > + * lookup_inode_permission_may_exec - Check traversal right for given inode
> > + *
> > + * This is a special case routine for may_lookup() making assumptions specific
> > + * to path traversal. Use inode_permission() if you are doing something else.
> > + *
> > + * Work is shaved off compared to inode_permission() as follows:
> > + * - we know for a fact there is no MAY_WRITE to worry about
> > + * - it is an invariant the inode is a directory
> > + *
> > + * Since majority of real-world traversal happens on inodes which grant it for
> > + * everyone, we check it upfront and only resort to more expensive work if it
> > + * fails.
> > + *
> > + * Filesystems which have their own ->permission hook and consequently miss out
> > + * on IOP_FASTPERM can still get the optimization if they set IOP_FASTPERM_MAY_EXEC
> > + * on their directory inodes.
> > + */
> > +static __always_inline int lookup_inode_permission_may_exec(struct mnt_idmap *idmap,
> > +     struct inode *inode, int mask)
> > +{
> > +     /* Lookup already checked this to return -ENOTDIR */
> > +     VFS_BUG_ON_INODE(!S_ISDIR(inode->i_mode), inode);
> > +     VFS_BUG_ON((mask & ~MAY_NOT_BLOCK) != 0);
> > +
> > +     mask |= MAY_EXEC;
> > +
> > +     if (unlikely(!(inode->i_opflags & (IOP_FASTPERM | IOP_FASTPERM_MAY_EXEC))))
> > +             return inode_permission(idmap, inode, mask);
> > +
> > +     if (unlikely(((inode->i_mode & 0111) != 0111) || !no_acl_inode(inode)))
> > +             return inode_permission(idmap, inode, mask);
> > +
> > +     return security_inode_permission(inode, mask);
> > +}
> > +
> >  /**
> >   * path_get - get a reference to a path
> >   * @path: path to get the reference to
> > @@ -1855,7 +1894,7 @@ static inline int may_lookup(struct mnt_idmap *idmap,
> >       int err, mask;
> >
> >       mask = nd->flags & LOOKUP_RCU ? MAY_NOT_BLOCK : 0;
> > -     err = inode_permission(idmap, nd->inode, mask | MAY_EXEC);
> > +     err = lookup_inode_permission_may_exec(idmap, nd->inode, mask);
> >       if (likely(!err))
> >               return 0;
> >
> > @@ -1870,7 +1909,7 @@ static inline int may_lookup(struct mnt_idmap *idmap,
> >       if (err != -ECHILD)     // hard error
> >               return err;
> >
> > -     return inode_permission(idmap, nd->inode, MAY_EXEC);
> > +     return lookup_inode_permission_may_exec(idmap, nd->inode, 0);
> >  }
> >
> >  static int reserve_stack(struct nameidata *nd, struct path *link)
> > diff --git a/include/linux/fs.h b/include/linux/fs.h
> > index 03e450dd5211..7d5de647ac7b 100644
> > --- a/include/linux/fs.h
> > +++ b/include/linux/fs.h
> > @@ -647,13 +647,14 @@ is_uncached_acl(struct posix_acl *acl)
> >       return (long)acl & 1;
> >  }
> >
> > -#define IOP_FASTPERM 0x0001
> > -#define IOP_LOOKUP   0x0002
> > -#define IOP_NOFOLLOW 0x0004
> > -#define IOP_XATTR    0x0008
> > +#define IOP_FASTPERM         0x0001
> > +#define IOP_LOOKUP           0x0002
> > +#define IOP_NOFOLLOW         0x0004
> > +#define IOP_XATTR            0x0008
> >  #define IOP_DEFAULT_READLINK 0x0010
> > -#define IOP_MGTIME   0x0020
> > -#define IOP_CACHED_LINK      0x0040
> > +#define IOP_MGTIME           0x0020
> > +#define IOP_CACHED_LINK              0x0040
> > +#define IOP_FASTPERM_MAY_EXEC        0x0080
> >
> >  /*
> >   * Inode state bits.  Protected by inode->i_lock
> > --
> > 2.48.1
> >
> --
> Jan Kara <jack@...e.com>
> SUSE Labs, CR

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ