[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20250414-anomalie-abpfiff-9f293dce366b@brauner>
Date: Mon, 14 Apr 2025 12:21:28 +0200
From: Christian Brauner <brauner@...nel.org>
To: Linus Torvalds <torvalds@...ux-foundation.org>
Cc: Mateusz Guzik <mjguzik@...il.com>, Al Viro <viro@...iv.linux.org.uk>,
linux-fsdevel <linux-fsdevel@...r.kernel.org>, Jan Kara <jack@...e.cz>,
Ext4 Developers List <linux-ext4@...r.kernel.org>
Subject: Re: generic_permission() optimization
On Sat, Apr 12, 2025 at 01:22:38PM -0700, Linus Torvalds wrote:
> On Sat, 12 Apr 2025 at 09:26, Mateusz Guzik <mjguzik@...il.com> wrote:
> >
> > I plopped your snippet towards the end of __ext4_iget:
>
> That's literally where I did the same thing, except I put it right after the
>
> brelse(iloc.bh);
>
> line, rather than before as you did.
>
> And it made no difference for me, but I didn't try to figure out why.
> Maybe some environment differences? Or maybe I just screwed up my
> testing...
>
> As mentioned earlier in the thread, I had this bi-modal distribution
> of results, because if I had a load where the *non*-owner of the inode
> looked up the pathnames, then the ACL information would get filled in
> when the VFS layer would do the lookup, and then once the ACLs were
> cached, everything worked beautifully.
>
> But if the only lookups of a path were done by the owner of the inodes
> (which is typical for at least my normal kernel build tree - nothing
> but my build will look at the files, and they are obviously always
> owned by me) then the ACL caches will never be filled because there
> will never be any real ACL lookups.
>
> And then rather than doing the nice efficient "no ACLs anywhere, no
> need to even look", it ends up having to actually do the vfsuid
> comparison for the UID equality check.
>
> Which then does the extra accesses to look up the idmap etc, and is
> visible in the profiles due to that whole dance:
>
> /* Are we the owner? If so, ACL's don't matter */
> vfsuid = i_uid_into_vfsuid(idmap, inode);
> if (likely(vfsuid_eq_kuid(vfsuid, current_fsuid()))) {
>
> even when idmap is 'nop_mnt_idmap' and it is reasonably cheap. Just
> because it ends up calling out to different functions and does extra
> D$ accesses to the inode and the suberblock (ie i_user_ns() is this
>
> return inode->i_sb->s_user_ns;
I think we can improve this. Right now multiple mounts from different
superblocks can share the same struct mnt_idmap. But I can change the
code so that struct mnt_idmap can only be shared between mounts from the
same superblock. With that we could do:
diff --git a/fs/mnt_idmapping.c b/fs/mnt_idmapping.c
index a37991fdb194..a5ec15c8c754 100644
--- a/fs/mnt_idmapping.c
+++ b/fs/mnt_idmapping.c
@@ -20,6 +20,7 @@
struct mnt_idmap {
struct uid_gid_map uid_map;
struct uid_gid_map gid_map;
+ struct user_namespace *s_user_ns;
refcount_t count;
};
And then stuff like:
static inline vfsuid_t i_uid_into_vfsuid(struct mnt_idmap *idmap,
const struct inode *inode)
{
return make_vfsuid(idmap, i_user_ns(inode), inode->i_uid);
}
just becomes:
static inline vfsuid_t i_uid_into_vfsuid(struct mnt_idmap *idmap,
const struct inode *inode)
{
return make_vfsuid(idmap, inode->i_uid);
}
which means:
vfsuid_t make_vfsuid(struct mnt_idmap *idmap,
kuid_t kuid)
{
uid_t uid;
if (idmap == &nop_mnt_idmap)
return VFSUIDT_INIT(kuid);
<snip>
}
will only have to verify nop_mnt_idmap and we never have to access the
inode->i_sb->s_user_ns at all.
I'll wip up a patch for this.
>
> so just to *see* that it's nop_mnt_idmap takes effort.
>
> One improvement might be to cache that 'nop_mnt_idmap' thing in the
> inode as a flag.
>
> But it would be even better if the filesystem just initializes the
> inode at inode read time to say "I have no ACL's for this inode" and
> none of this code will even trigger.
Yes, let's please do this.
Powered by blists - more mailing lists