[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <ZHepMQybpBcKLgVg@P9FQF9L96D.corp.robot.car>
Date: Wed, 31 May 2023 13:08:17 -0700
From: Roman Gushchin <roman.gushchin@...ux.dev>
To: Vlastimil Babka <vbabka@...e.cz>
Cc: Jesper Dangaard Brouer <brouer@...hat.com>, netdev@...r.kernel.org,
linux-mm@...ck.org, Christoph Lameter <cl@...ux.com>,
Andrew Morton <akpm@...ux-foundation.org>,
Mel Gorman <mgorman@...hsingularity.net>,
Joonsoo Kim <iamjoonsoo.kim@....com>, penberg@...nel.org,
Jakub Kicinski <kuba@...nel.org>,
"David S. Miller" <davem@...emloft.net>, edumazet@...gle.com,
pabeni@...hat.com, Matthew Wilcox <willy@...radead.org>,
Hyeonggon Yoo <42.hyeyoo@...il.com>,
David Sterba <dsterba@...e.com>
Subject: Re: [PATCH RFC] mm+net: allow to set kmem_cache create flag for
SLAB_NEVER_MERGE
On Wed, May 31, 2023 at 02:03:05PM +0200, Vlastimil Babka wrote:
> On 1/17/23 14:40, Jesper Dangaard Brouer wrote:
> > Allow API users of kmem_cache_create to specify that they don't want
> > any slab merge or aliasing (with similar sized objects). Use this in
> > network stack and kfence_test.
> >
> > The SKB (sk_buff) kmem_cache slab is critical for network performance.
> > Network stack uses kmem_cache_{alloc,free}_bulk APIs to gain
> > performance by amortising the alloc/free cost.
> >
> > For the bulk API to perform efficiently the slub fragmentation need to
> > be low. Especially for the SLUB allocator, the efficiency of bulk free
> > API depend on objects belonging to the same slab (page).
> >
> > When running different network performance microbenchmarks, I started
> > to notice that performance was reduced (slightly) when machines had
> > longer uptimes. I believe the cause was 'skbuff_head_cache' got
> > aliased/merged into the general slub for 256 bytes sized objects (with
> > my kernel config, without CONFIG_HARDENED_USERCOPY).
> >
> > For SKB kmem_cache network stack have reasons for not merging, but it
> > varies depending on kernel config (e.g. CONFIG_HARDENED_USERCOPY).
> > We want to explicitly set SLAB_NEVER_MERGE for this kmem_cache.
> >
> > Signed-off-by: Jesper Dangaard Brouer <brouer@...hat.com>
>
> Since this idea was revived by David [1], and neither patch worked as is,
> but yours was more complete and first, I have fixed it up as below. The
> skbuff part itself will be best submitted separately afterwards so we don't
> get conflicts between trees etc. Comments?
>
> ----8<----
> From 485d3f58f3e797306b803102573e7f1367af2ad2 Mon Sep 17 00:00:00 2001
> From: Jesper Dangaard Brouer <brouer@...hat.com>
> Date: Tue, 17 Jan 2023 14:40:00 +0100
> Subject: [PATCH] mm/slab: introduce kmem_cache flag SLAB_NO_MERGE
>
> Allow API users of kmem_cache_create to specify that they don't want
> any slab merge or aliasing (with similar sized objects). Use this in
> kfence_test.
>
> The SKB (sk_buff) kmem_cache slab is critical for network performance.
> Network stack uses kmem_cache_{alloc,free}_bulk APIs to gain
> performance by amortising the alloc/free cost.
>
> For the bulk API to perform efficiently the slub fragmentation need to
> be low. Especially for the SLUB allocator, the efficiency of bulk free
> API depend on objects belonging to the same slab (page).
>
> When running different network performance microbenchmarks, I started
> to notice that performance was reduced (slightly) when machines had
> longer uptimes. I believe the cause was 'skbuff_head_cache' got
> aliased/merged into the general slub for 256 bytes sized objects (with
> my kernel config, without CONFIG_HARDENED_USERCOPY).
>
> For SKB kmem_cache network stack have reasons for not merging, but it
> varies depending on kernel config (e.g. CONFIG_HARDENED_USERCOPY).
> We want to explicitly set SLAB_NO_MERGE for this kmem_cache.
I believe it's also good for the visibility: having a separate entity in
/proc/slabinfo for skbuff_head_cache can be really useful.
>
> Another use case for the flag has been described by David Sterba [1]:
>
> > This can be used for more fine grained control over the caches or for
> > debugging builds where separate slabs can verify that no objects leak.
>
> > The slab_nomerge boot option is too coarse and would need to be
> > enabled on all testing hosts. There are some other ways how to disable
> > merging, e.g. a slab constructor but this disables poisoning besides
> > that it adds additional overhead. Other flags are internal and may
> > have other semantics.
>
> > A concrete example what motivates the flag. During 'btrfs balance'
> > slab top reported huge increase in caches like
>
> > 1330095 1330095 100% 0.10K 34105 39 136420K Acpi-ParseExt
> > 1734684 1734684 100% 0.14K 61953 28 247812K pid_namespace
> > 8244036 6873075 83% 0.11K 229001 36 916004K khugepaged_mm_slot
>
> > which was confusing and that it's because of slab merging was not the
> > first idea. After rebooting with slab_nomerge all the caches were
> > from btrfs_ namespace as expected.
>
> [1] https://lore.kernel.org/all/20230524101748.30714-1-dsterba@suse.com/
>
> [ vbabka@...e.cz: rename to SLAB_NO_MERGE, change the flag value to the
> one proposed by David so it does not collide with internal SLAB/SLUB
> flags, write a comment for the flag, expand changelog, drop the skbuff
> part to be handled spearately ]
>
> Reported-by: David Sterba <dsterba@...e.com>
> Signed-off-by: Jesper Dangaard Brouer <brouer@...hat.com>
> Signed-off-by: Vlastimil Babka <vbabka@...e.cz>
Acked-by: Roman Gushchin <roman.gushchin@...ux.dev>
both for this patch and the corresponding change on the networking side.
Thanks!
Powered by blists - more mailing lists