[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <6d5bb852-8703-4abf-a52b-90816bccbd7f@suse.cz>
Date: Fri, 26 Jan 2024 10:50:44 +0100
From: Vlastimil Babka <vbabka@...e.cz>
To: Linus Torvalds <torvalds@...ux-foundation.org>,
Shakeel Butt <shakeelb@...gle.com>
Cc: Roman Gushchin <roman.gushchin@...ux.dev>,
Josh Poimboeuf <jpoimboe@...nel.org>, Jeff Layton <jlayton@...nel.org>,
Chuck Lever <chuck.lever@...cle.com>, Johannes Weiner <hannes@...xchg.org>,
Michal Hocko <mhocko@...nel.org>, linux-kernel@...r.kernel.org,
Jens Axboe <axboe@...nel.dk>, Tejun Heo <tj@...nel.org>,
Vasily Averin <vasily.averin@...ux.dev>, Michal Koutny <mkoutny@...e.com>,
Waiman Long <longman@...hat.com>, Muchun Song <muchun.song@...ux.dev>,
Jiri Kosina <jikos@...nel.org>, cgroups@...r.kernel.org, linux-mm@...ck.org,
Howard McLauchlan <hmclauchlan@...com>, bpf <bpf@...r.kernel.org>,
Jens Axboe <axboe@...nel.dk>
Subject: Re: [PATCH RFC 1/4] fs/locks: Fix file lock cache accounting, again
On 1/22/24 06:10, Linus Torvalds wrote:
> On Wed, 17 Jan 2024 at 14:56, Shakeel Butt <shakeelb@...gle.com> wrote:
>> >
>> > So I don't see how we can make it really cheap (say, less than 5% overhead)
>> > without caching pre-accounted objects.
>>
>> Maybe this is what we want. Now we are down to just SLUB, maybe such
>> caching of pre-accounted objects can be in SLUB layer and we can
>> decide to keep this caching per-kmem-cache opt-in or always on.
>
> So it turns out that we have another case of SLAB_ACCOUNT being quite
> a big expense, and it's actually the normal - but failed - open() or
> execve() case.
>
> See the thread at
>
> https://lore.kernel.org/all/CAHk-=whw936qzDLBQdUz-He5WK_0fRSWwKAjtbVsMGfX70Nf_Q@mail.gmail.com/
>
> and to see the effect in profiles, you can use this EXTREMELY stupid
> test program:
>
> #include <fcntl.h>
>
> int main(int argc, char **argv)
> {
> for (int i = 0; i < 10000000; i++)
> open("nonexistent", O_RDONLY);
> }
This reminded me I can see should_failslab() in the profiles (1.43% plus the
overhead in its caller) even if it does nothing at all, and it's completely
unconditional since commit 4f6923fbb352 ("mm: make should_failslab always
available for fault injection").
We discussed it briefly when Jens tried to change it in [1] to depend on
CONFIG_FAILSLAB again. But now I think it should be even possible to leave
it always available, but behind a static key. BPF or whoever else uses these
error injection hooks would have to track how many users of them are active
and manage the static key accordingly. Then it could be always available,
but have no overhead when there's no active user? Other similars hooks could
benefit from such an approach too?
[1]
https://lore.kernel.org/all/e01e5e40-692a-519c-4cba-e3331f173c82@kernel.dk/#t
> where the point of course is that the "nonexistent" pathname doesn't
> actually exist (so don't create a file called that for the test).
>
> What happens is that open() allocates a 'struct file *' early from the
> filp kmem_cache, which has SLAB_ACCOUNT set. So we'll do accounting
> for it, failt the pathname open, and free it again, which uncharges
> the accounting.
>
> Now, in this case, I actually have a suggestion: could we please just
> make SLAB_ACCOUNT be something that we do *after* the allocation, kind
> of the same way the zeroing works?
>
> IOW, I'd love to get rid of slab_pre_alloc_hook() entirely, and make
> slab_post_alloc_hook() do all the "charge the memcg if required".
>
> Obviously that means that now a failure to charge the memcg would have
> to then de-allocate things, but that's an uncommon path and would be
> marked unlikely and not be in the hot path at all.
>
> Now, the reason I would prefer that is that the *second* step would be to
>
> (a) expose a "kmem_cache_charge()" function that takes a
> *non*-accounted slab allocation, and turns it into an accounted one
> (and obviously this is why you want to do everything in the post-alloc
> hook: just try to share this code)
>
> (b) remote the SLAB_ACCOUNT from the filp_cachep, making all file
> allocations start out unaccounted.
>
> (c) when we have *actually* looked up the pathname and open the file
> successfully, at *that* point we'd do a
>
> error = kmem_cache_charge(filp_cachep, file);
>
> in do_dentry_open() to turn the unaccounted file pointer into an
> accounted one (and if that fails, we do the cleanup and free it, of
> course, exactly like we do when file_get_write_access() fails)
>
> which means that now the failure case doesn't unnecessarily charge the
> allocation that never ends up being finalized.
>
> NOTE! I think this would clean up mm/slub.c too, simply because it
> would get rid of that memcg_slab_pre_alloc_hook() entirely, and get
> rid of the need to carry the "struct obj_cgroup **objcgp" pointer
> along until the post-alloc hook: everything would be done post-alloc.
>
> The actual kmem_cache_free() code already deals with "this slab hasn't
> been accounted" because it obviously has to deal with allocations that
> were done without __GFP_ACCOUNT anyway. So there's no change needed on
> the freeing path, it already has to handle this all gracefully.
>
> I may be missing something, but it would seem to have very little
> downside, and fix a case that actually is visible right now.
>
> Linus
Powered by blists - more mailing lists