lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAJuCfpHjNGheTSLkxRaxktFQ-PfhEDSZjSz=-CPzkUx9TiWNEQ@mail.gmail.com>
Date: Mon, 27 Jan 2025 11:38:32 -0800
From: Suren Baghdasaryan <surenb@...gle.com>
To: Vlastimil Babka <vbabka@...e.cz>, Steven Rostedt <rostedt@...dmis.org>
Cc: akpm@...ux-foundation.org, Peter Zijlstra <peterz@...radead.org>, 
	kent.overstreet@...ux.dev, yuzhao@...gle.com, minchan@...gle.com, 
	shakeel.butt@...ux.dev, souravpanda@...gle.com, pasha.tatashin@...een.com, 
	00107082@....com, quic_zhenhuah@...cinc.com, linux-mm@...ck.org, 
	linux-kernel@...r.kernel.org
Subject: Re: [PATCH 2/3] alloc_tag: uninline code gated by mem_alloc_profiling_key
 in slab allocator

On Sun, Jan 26, 2025 at 8:47 AM Vlastimil Babka <vbabka@...e.cz> wrote:
>
> On 1/26/25 08:02, Suren Baghdasaryan wrote:
> > When a sizable code section is protected by a disabled static key, that
> > code gets into the instruction cache even though it's not executed and
> > consumes the cache, increasing cache misses. This can be remedied by
> > moving such code into a separate uninlined function. The improvement

Sorry, I missed adding Steven Rostedt into the CC list since his
advice was instrumental in finding the way to optimize the static key
performance in this patch. Added now.

>
> Weird, I thought the static_branch_likely/unlikely/maybe was already
> handling this by the unlikely case being a jump to a block away from the
> fast-path stream of instructions, thus making it less likely to get cached.
> AFAIU even plain likely()/unlikely() should do this, along with branch
> prediction hints.

This was indeed an unexpected overhead when I measured it on Android.
Cache pollution was my understanding of the cause for this high
overhead after Steven told me to try uninlining the protected code. He
has done something similar in the tracing subsystem. But maybe I
misunderstood the real reason. Steven, could you please verify if my
understanding of the high overhead cause is correct here? Maybe there
is something else at play that I missed?

>
> > however comes at the expense of the configuration when this static key
> > gets enabled since there is now an additional function call.
> > The default state of the mem_alloc_profiling_key is controlled by
> > CONFIG_MEM_ALLOC_PROFILING_ENABLED_BY_DEFAULT. Apply this optimization
> > only if CONFIG_MEM_ALLOC_PROFILING_ENABLED_BY_DEFAULT=n, improving the
> > performance of the default configuration.
> > When CONFIG_MEM_ALLOC_PROFILING_ENABLED_BY_DEFAULT=y the functions
> > are inlined and performance does not change.
> >
> > On a Pixel6 phone, slab allocation profiling overhead measured with
> > CONFIG_MEM_ALLOC_PROFILING_ENABLED_BY_DEFAULT=n and profiling disabled:
> >
> >              baseline             modified
> > Big          3.31%                0.17%
> > Medium       3.79%                0.57%
> > Little       6.68%                1.28%
>
> What does big/medium/little mean here? But indeed not nice overhead for
> disabled static key.

Big/Medium/Little is the CPU core size on my ARM64-based Android phone.

>
> > When CONFIG_MEM_ALLOC_PROFILING_ENABLED_BY_DEFAULT=n and memory allocation
> > profiling gets enabled, the difference in performance before and after
> > this change stays within noise levels.
> >
> > On x86 this patch does not make noticeable difference because the overhead
> > with mem_alloc_profiling_key disabled is much lower (under 1%) to start
> > with, so any improvement is less visible and hard to distinguish from the
> > noise.
>
> That would be in line with my understanding above. Does the arm64 compiler
> not do it as well as x86 (could be maybe found out by disassembling) or the
> Pixel6 cpu somhow caches these out of line blocks more aggressively and only
> a function call stops it?

I'll disassemble the code and will see what it looks like.

>
> > Signed-off-by: Suren Baghdasaryan <surenb@...gle.com>
>
> Kinda sad that despite the static key we have to control a lot by the
> CONFIG_MEM_ALLOC_PROFILING_ENABLED_BY_DEFAULT in addition.

I agree. If there is a better way to fix this regression I'm open to
changes. Let's wait for Steven to confirm my understanding before
proceeding.
Thanks,
Suren.

>
> > ---
> >  include/linux/alloc_tag.h |  6 +++++
> >  mm/slub.c                 | 46 ++++++++++++++++++++++++---------------
> >  2 files changed, 34 insertions(+), 18 deletions(-)
> >
> > diff --git a/include/linux/alloc_tag.h b/include/linux/alloc_tag.h
> > index a946e0203e6d..c5de2a0c1780 100644
> > --- a/include/linux/alloc_tag.h
> > +++ b/include/linux/alloc_tag.h
> > @@ -116,6 +116,12 @@ DECLARE_PER_CPU(struct alloc_tag_counters, _shared_alloc_tag);
> >  DECLARE_STATIC_KEY_MAYBE(CONFIG_MEM_ALLOC_PROFILING_ENABLED_BY_DEFAULT,
> >                       mem_alloc_profiling_key);
> >
> > +#ifdef CONFIG_MEM_ALLOC_PROFILING_ENABLED_BY_DEFAULT
> > +#define inline_if_mem_alloc_prof     inline
> > +#else
> > +#define inline_if_mem_alloc_prof     noinline
> > +#endif
> > +
> >  static inline bool mem_alloc_profiling_enabled(void)
> >  {
> >       return static_branch_maybe(CONFIG_MEM_ALLOC_PROFILING_ENABLED_BY_DEFAULT,
> > diff --git a/mm/slub.c b/mm/slub.c
> > index 996691c137eb..3107d43dfddc 100644
> > --- a/mm/slub.c
> > +++ b/mm/slub.c
> > @@ -2000,7 +2000,7 @@ int alloc_slab_obj_exts(struct slab *slab, struct kmem_cache *s,
> >       return 0;
> >  }
> >
> > -static inline void free_slab_obj_exts(struct slab *slab)
> > +static inline_if_mem_alloc_prof void free_slab_obj_exts(struct slab *slab)
> >  {
> >       struct slabobj_ext *obj_exts;
> >
> > @@ -2077,33 +2077,35 @@ prepare_slab_obj_exts_hook(struct kmem_cache *s, gfp_t flags, void *p)
> >       return slab_obj_exts(slab) + obj_to_index(s, slab, p);
> >  }
> >
> > -static inline void
> > -alloc_tagging_slab_alloc_hook(struct kmem_cache *s, void *object, gfp_t flags)
> > +static inline_if_mem_alloc_prof void
> > +__alloc_tagging_slab_alloc_hook(struct kmem_cache *s, void *object, gfp_t flags)
> >  {
> > -     if (need_slab_obj_ext()) {
> > -             struct slabobj_ext *obj_exts;
> > +     struct slabobj_ext *obj_exts;
> >
> > -             obj_exts = prepare_slab_obj_exts_hook(s, flags, object);
> > -             /*
> > -              * Currently obj_exts is used only for allocation profiling.
> > -              * If other users appear then mem_alloc_profiling_enabled()
> > -              * check should be added before alloc_tag_add().
> > -              */
> > -             if (likely(obj_exts))
> > -                     alloc_tag_add(&obj_exts->ref, current->alloc_tag, s->size);
> > -     }
> > +     obj_exts = prepare_slab_obj_exts_hook(s, flags, object);
> > +     /*
> > +      * Currently obj_exts is used only for allocation profiling.
> > +      * If other users appear then mem_alloc_profiling_enabled()
> > +      * check should be added before alloc_tag_add().
> > +      */
> > +     if (likely(obj_exts))
> > +             alloc_tag_add(&obj_exts->ref, current->alloc_tag, s->size);
> >  }
> >
> >  static inline void
> > -alloc_tagging_slab_free_hook(struct kmem_cache *s, struct slab *slab, void **p,
> > +alloc_tagging_slab_alloc_hook(struct kmem_cache *s, void *object, gfp_t flags)
> > +{
> > +     if (need_slab_obj_ext())
> > +             __alloc_tagging_slab_alloc_hook(s, object, flags);
> > +}
> > +
> > +static inline_if_mem_alloc_prof void
> > +__alloc_tagging_slab_free_hook(struct kmem_cache *s, struct slab *slab, void **p,
> >                            int objects)
> >  {
> >       struct slabobj_ext *obj_exts;
> >       int i;
> >
> > -     if (!mem_alloc_profiling_enabled())
> > -             return;
> > -
> >       /* slab->obj_exts might not be NULL if it was created for MEMCG accounting. */
> >       if (s->flags & (SLAB_NO_OBJ_EXT | SLAB_NOLEAKTRACE))
> >               return;
> > @@ -2119,6 +2121,14 @@ alloc_tagging_slab_free_hook(struct kmem_cache *s, struct slab *slab, void **p,
> >       }
> >  }
> >
> > +static inline void
> > +alloc_tagging_slab_free_hook(struct kmem_cache *s, struct slab *slab, void **p,
> > +                          int objects)
> > +{
> > +     if (mem_alloc_profiling_enabled())
> > +             __alloc_tagging_slab_free_hook(s, slab, p, objects);
> > +}
> > +
> >  #else /* CONFIG_MEM_ALLOC_PROFILING */
> >
> >  static inline void
>

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ