lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <93f8ac7a-26a8-49c1-9fa3-3a27b0000123@suse.cz>
Date: Wed, 29 Jan 2025 10:50:43 +0100
From: Vlastimil Babka <vbabka@...e.cz>
To: Steven Rostedt <rostedt@...dmis.org>,
 Suren Baghdasaryan <surenb@...gle.com>
Cc: akpm@...ux-foundation.org, Peter Zijlstra <peterz@...radead.org>,
 kent.overstreet@...ux.dev, yuzhao@...gle.com, minchan@...gle.com,
 shakeel.butt@...ux.dev, souravpanda@...gle.com, pasha.tatashin@...een.com,
 00107082@....com, quic_zhenhuah@...cinc.com, linux-mm@...ck.org,
 linux-kernel@...r.kernel.org
Subject: Re: [PATCH 2/3] alloc_tag: uninline code gated by
 mem_alloc_profiling_key in slab allocator

On 1/29/25 01:03, Steven Rostedt wrote:
> On Tue, 28 Jan 2025 15:43:13 -0800
> Suren Baghdasaryan <surenb@...gle.com> wrote:
> 
>> > How slow is it to always do the call instead of inlining?  
>> 
>> Let's see... The additional overhead if we always call is:
>> 
>> Little core: 2.42%
>> Middle core: 1.23%
>> Big core: 0.66%
>> 
>> Not a huge deal because the overhead of memory profiling when enabled
>> is much higher. So, maybe for simplicity I should indeed always call?
> 
> That's what I was thinking, unless the other maintainers are OK with this
> special logic.

If it's acceptable, I would prefer to always call. But at the same time make
sure the static key test is really inlined, i.e. force inline
alloc_tagging_slab_alloc_hook() (see my other reply looking at the disassembly).

Well or rather just open-code the contents of the
alloc_tagging_slab_alloc_hook and alloc_tagging_slab_free_hook (as they look
after this patch) into the callers. It's just two lines. The extra layer is
just unnecessary distraction.

Then it's probably inevitable the actual hook content after the static key
test should not be inline even with
CONFIG_MEM_ALLOC_PROFILING_ENABLED_BY_DEFAULT as the result would be inlined
into too many places. But since we remove one call layer anyway thanks to
above, even without the full inlining the resulting performance could
hopefully be fine (compared to the state before your series).

> -- Steve


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ