[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAHk-=whw936qzDLBQdUz-He5WK_0fRSWwKAjtbVsMGfX70Nf_Q@mail.gmail.com>
Date: Sat, 20 Jan 2024 14:18:36 -0800
From: Linus Torvalds <torvalds@...ux-foundation.org>
To: Al Viro <viro@...iv.linux.org.uk>
Cc: Josh Triplett <josh@...htriplett.org>, Kees Cook <kees@...nel.org>,
Kees Cook <keescook@...omium.org>, linux-kernel@...r.kernel.org,
Alexey Dobriyan <adobriyan@...il.com>
Subject: Re: [GIT PULL] execve updates for v6.8-rc1
On Thu, 11 Jan 2024 at 09:42, Linus Torvalds
<torvalds@...ux-foundation.org> wrote:
>
> It's the "don't allocate filp until you actually need it" that looks
> nasty. And yes, atomic_open() is part of the problem, but so is the
> fact that wee end up saving some flags in the filp early.
So just an update on this, since I came back to it.
It turns out that the bulk of the cost of the 'struct file *'
allocation is actually the exact same thing that was discussed the
other day about file locking: it's the fact that the file allocation
is done with SLAB_ACCOUNT. See
https://lore.kernel.org/all/CAHk-=wg_CoTOfkREgaQQA6oJ5nM9ZKYrTn=E1r-JnvmQcgWpSg@mail.gmail.com/
and that thread on the recent file locking accounting discussion.
Now, the allocation itself isn't free either, but the SLAB_ACCOUNT
really does make it noticeable more expensive than it should be.
It's a bit more spread out: the cost of the slab allocation itself is
mainly the (optimized) path that does a cmpxchg and the memset, but
the SLAB_ACCOUNT cost is spread out in mod_objcg_state,
__memcg_slab_post_alloc_hook, obj_cgroup_charge,
__memcg_slab_free_hook).
And that actually opens the door up for a _somewhat_ simple partial
workaround: instead of using SLAB_ACCOUNT, we could just do the memcg
accounting when we set FMODE_OPEN instead, and de-account it when we
free the filp (which checks FMODE_OPEN since other cleanup is
dependent on that anyway).
That would help not just the execve() case, but the whole "open
non-existent file" case too.
And I suspect "open()" with ENOENT is actually way more common than
execve() is. All those open->fstat paths for various perfectly normal
loads.
Anyway, I didn't actually write that patch, but I did want to mention
it as a smaller-scale fix (because getting rid of the 'struct file'
allocation entirely does look somewhat painful).
End result: I committed my "move do_open_execat() to the beginning of
execve()" patch, since it's clearly an improvement on the existing
behavior, and that whole "struct file allocations are unnecessarily
expensive" issue is a separate thing.
Linus
Powered by blists - more mailing lists