lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250128143549.62f458a7@gandalf.local.home>
Date: Tue, 28 Jan 2025 14:35:49 -0500
From: Steven Rostedt <rostedt@...dmis.org>
To: Suren Baghdasaryan <surenb@...gle.com>
Cc: Vlastimil Babka <vbabka@...e.cz>, akpm@...ux-foundation.org, Peter
 Zijlstra <peterz@...radead.org>, kent.overstreet@...ux.dev,
 yuzhao@...gle.com, minchan@...gle.com, shakeel.butt@...ux.dev,
 souravpanda@...gle.com, pasha.tatashin@...een.com, 00107082@....com,
 quic_zhenhuah@...cinc.com, linux-mm@...ck.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH 2/3] alloc_tag: uninline code gated by
 mem_alloc_profiling_key in slab allocator

On Mon, 27 Jan 2025 11:38:32 -0800
Suren Baghdasaryan <surenb@...gle.com> wrote:

> On Sun, Jan 26, 2025 at 8:47 AM Vlastimil Babka <vbabka@...e.cz> wrote:
> >
> > On 1/26/25 08:02, Suren Baghdasaryan wrote:  
> > > When a sizable code section is protected by a disabled static key, that
> > > code gets into the instruction cache even though it's not executed and
> > > consumes the cache, increasing cache misses. This can be remedied by
> > > moving such code into a separate uninlined function. The improvement  
> 
> Sorry, I missed adding Steven Rostedt into the CC list since his
> advice was instrumental in finding the way to optimize the static key
> performance in this patch. Added now.
> 
> >
> > Weird, I thought the static_branch_likely/unlikely/maybe was already
> > handling this by the unlikely case being a jump to a block away from the
> > fast-path stream of instructions, thus making it less likely to get cached.
> > AFAIU even plain likely()/unlikely() should do this, along with branch
> > prediction hints.  
> 
> This was indeed an unexpected overhead when I measured it on Android.
> Cache pollution was my understanding of the cause for this high
> overhead after Steven told me to try uninlining the protected code. He
> has done something similar in the tracing subsystem. But maybe I
> misunderstood the real reason. Steven, could you please verify if my
> understanding of the high overhead cause is correct here? Maybe there
> is something else at play that I missed?

From what I understand, is that the compiler will only move code to the end
of a function with the unlikely(). But, the code after the function could
also be in the control flow path. If you have several functions that are
called together, by adding code to the unlikely() cases may not help the
speed.

I made an effort to make the tracepoint code call functions instead of
having everything inlined. It actually brought down the size of the text of
the kernel, but looking in the change logs I never posted benchmarks. But
I'm sure making the size of the scheduler text section smaller probably did
help.

> > That would be in line with my understanding above. Does the arm64 compiler
> > not do it as well as x86 (could be maybe found out by disassembling) or the
> > Pixel6 cpu somhow caches these out of line blocks more aggressively and only
> > a function call stops it?  
> 
> I'll disassemble the code and will see what it looks like.

I think I asked you to do that too ;-)

> 
> >  
> > > Signed-off-by: Suren Baghdasaryan <surenb@...gle.com>  
> >
> > Kinda sad that despite the static key we have to control a lot by the
> > CONFIG_MEM_ALLOC_PROFILING_ENABLED_BY_DEFAULT in addition.  
> 
> I agree. If there is a better way to fix this regression I'm open to
> changes. Let's wait for Steven to confirm my understanding before
> proceeding.

How slow is it to always do the call instead of inlining?

-- Steve

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ