[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CAJuCfpGvpTKAevVkBUy==vEC+x+=Rn9SBE8=8TjbtLqnHHby1Q@mail.gmail.com>
Date: Wed, 29 Jan 2025 09:26:27 -0800
From: Suren Baghdasaryan <surenb@...gle.com>
To: Vlastimil Babka <vbabka@...e.cz>
Cc: Steven Rostedt <rostedt@...dmis.org>, akpm@...ux-foundation.org,
Peter Zijlstra <peterz@...radead.org>, kent.overstreet@...ux.dev, yuzhao@...gle.com,
minchan@...gle.com, shakeel.butt@...ux.dev, souravpanda@...gle.com,
pasha.tatashin@...een.com, 00107082@....com, quic_zhenhuah@...cinc.com,
linux-mm@...ck.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH 2/3] alloc_tag: uninline code gated by mem_alloc_profiling_key
in slab allocator
On Wed, Jan 29, 2025 at 1:50 AM Vlastimil Babka <vbabka@...e.cz> wrote:
>
> On 1/29/25 01:03, Steven Rostedt wrote:
> > On Tue, 28 Jan 2025 15:43:13 -0800
> > Suren Baghdasaryan <surenb@...gle.com> wrote:
> >
> >> > How slow is it to always do the call instead of inlining?
> >>
> >> Let's see... The additional overhead if we always call is:
> >>
> >> Little core: 2.42%
> >> Middle core: 1.23%
> >> Big core: 0.66%
> >>
> >> Not a huge deal because the overhead of memory profiling when enabled
> >> is much higher. So, maybe for simplicity I should indeed always call?
> >
> > That's what I was thinking, unless the other maintainers are OK with this
> > special logic.
>
> If it's acceptable, I would prefer to always call.
Ok, I'll post that version. If this becomes an issue we can reconsider later.
> But at the same time make
> sure the static key test is really inlined, i.e. force inline
> alloc_tagging_slab_alloc_hook() (see my other reply looking at the disassembly).
Sorry, I should have made it clear that I uninlined
alloc_tagging_slab_alloc_hook() to localize the relevant code. If
reality it is inlined. Since inlined outputs are quite big, I'm
attaching disassembly of kmem_cache_alloc_noprof() which has
alloc_tagging_slab_alloc_hook() inlined in it.
>
> Well or rather just open-code the contents of the
> alloc_tagging_slab_alloc_hook and alloc_tagging_slab_free_hook (as they look
> after this patch) into the callers. It's just two lines. The extra layer is
> just unnecessary distraction.
alloc_tagging_slab_alloc_hook() is inlined, no need to open-code.
>
> Then it's probably inevitable the actual hook content after the static key
> test should not be inline even with
> CONFIG_MEM_ALLOC_PROFILING_ENABLED_BY_DEFAULT as the result would be inlined
> into too many places. But since we remove one call layer anyway thanks to
> above, even without the full inlining the resulting performance could
> hopefully be fine (compared to the state before your series).
Agree. Thanks for the feedback!
I'll prepare v2 with no dependency on
CONFIG_MEM_ALLOC_PROFILING_ENABLED_BY_DEFAULT for not inlining (always
call).
>
> > -- Steve
>
View attachment "noinline.txt" of type "text/plain" (14234 bytes)
View attachment "inline.txt" of type "text/plain" (28223 bytes)
Powered by blists - more mailing lists