lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <ZHepMQybpBcKLgVg@P9FQF9L96D.corp.robot.car>
Date: Wed, 31 May 2023 13:08:17 -0700
From: Roman Gushchin <roman.gushchin@...ux.dev>
To: Vlastimil Babka <vbabka@...e.cz>
Cc: Jesper Dangaard Brouer <brouer@...hat.com>, netdev@...r.kernel.org,
	linux-mm@...ck.org, Christoph Lameter <cl@...ux.com>,
	Andrew Morton <akpm@...ux-foundation.org>,
	Mel Gorman <mgorman@...hsingularity.net>,
	Joonsoo Kim <iamjoonsoo.kim@....com>, penberg@...nel.org,
	Jakub Kicinski <kuba@...nel.org>,
	"David S. Miller" <davem@...emloft.net>, edumazet@...gle.com,
	pabeni@...hat.com, Matthew Wilcox <willy@...radead.org>,
	Hyeonggon Yoo <42.hyeyoo@...il.com>,
	David Sterba <dsterba@...e.com>
Subject: Re: [PATCH RFC] mm+net: allow to set kmem_cache create flag for
 SLAB_NEVER_MERGE

On Wed, May 31, 2023 at 02:03:05PM +0200, Vlastimil Babka wrote:
> On 1/17/23 14:40, Jesper Dangaard Brouer wrote:
> > Allow API users of kmem_cache_create to specify that they don't want
> > any slab merge or aliasing (with similar sized objects). Use this in
> > network stack and kfence_test.
> > 
> > The SKB (sk_buff) kmem_cache slab is critical for network performance.
> > Network stack uses kmem_cache_{alloc,free}_bulk APIs to gain
> > performance by amortising the alloc/free cost.
> > 
> > For the bulk API to perform efficiently the slub fragmentation need to
> > be low. Especially for the SLUB allocator, the efficiency of bulk free
> > API depend on objects belonging to the same slab (page).
> > 
> > When running different network performance microbenchmarks, I started
> > to notice that performance was reduced (slightly) when machines had
> > longer uptimes. I believe the cause was 'skbuff_head_cache' got
> > aliased/merged into the general slub for 256 bytes sized objects (with
> > my kernel config, without CONFIG_HARDENED_USERCOPY).
> > 
> > For SKB kmem_cache network stack have reasons for not merging, but it
> > varies depending on kernel config (e.g. CONFIG_HARDENED_USERCOPY).
> > We want to explicitly set SLAB_NEVER_MERGE for this kmem_cache.
> > 
> > Signed-off-by: Jesper Dangaard Brouer <brouer@...hat.com>
> 
> Since this idea was revived by David [1], and neither patch worked as is,
> but yours was more complete and first, I have fixed it up as below. The
> skbuff part itself will be best submitted separately afterwards so we don't
> get conflicts between trees etc. Comments?
> 
> ----8<----
> From 485d3f58f3e797306b803102573e7f1367af2ad2 Mon Sep 17 00:00:00 2001
> From: Jesper Dangaard Brouer <brouer@...hat.com>
> Date: Tue, 17 Jan 2023 14:40:00 +0100
> Subject: [PATCH] mm/slab: introduce kmem_cache flag SLAB_NO_MERGE
> 
> Allow API users of kmem_cache_create to specify that they don't want
> any slab merge or aliasing (with similar sized objects). Use this in
> kfence_test.
> 
> The SKB (sk_buff) kmem_cache slab is critical for network performance.
> Network stack uses kmem_cache_{alloc,free}_bulk APIs to gain
> performance by amortising the alloc/free cost.
> 
> For the bulk API to perform efficiently the slub fragmentation need to
> be low. Especially for the SLUB allocator, the efficiency of bulk free
> API depend on objects belonging to the same slab (page).
> 
> When running different network performance microbenchmarks, I started
> to notice that performance was reduced (slightly) when machines had
> longer uptimes. I believe the cause was 'skbuff_head_cache' got
> aliased/merged into the general slub for 256 bytes sized objects (with
> my kernel config, without CONFIG_HARDENED_USERCOPY).
> 
> For SKB kmem_cache network stack have reasons for not merging, but it
> varies depending on kernel config (e.g. CONFIG_HARDENED_USERCOPY).
> We want to explicitly set SLAB_NO_MERGE for this kmem_cache.

I believe it's also good for the visibility: having a separate entity in
/proc/slabinfo for skbuff_head_cache can be really useful.

> 
> Another use case for the flag has been described by David Sterba [1]:
> 
> > This can be used for more fine grained control over the caches or for
> > debugging builds where separate slabs can verify that no objects leak.
> 
> > The slab_nomerge boot option is too coarse and would need to be
> > enabled on all testing hosts. There are some other ways how to disable
> > merging, e.g. a slab constructor but this disables poisoning besides
> > that it adds additional overhead. Other flags are internal and may
> > have other semantics.
> 
> > A concrete example what motivates the flag. During 'btrfs balance'
> > slab top reported huge increase in caches like
> 
> >  1330095 1330095 100%    0.10K  34105       39    136420K Acpi-ParseExt
> >  1734684 1734684 100%    0.14K  61953       28    247812K pid_namespace
> >  8244036 6873075  83%    0.11K 229001       36    916004K khugepaged_mm_slot
> 
> > which was confusing and that it's because of slab merging was not the
> > first idea.  After rebooting with slab_nomerge all the caches were
> > from btrfs_ namespace as expected.
> 
> [1] https://lore.kernel.org/all/20230524101748.30714-1-dsterba@suse.com/
> 
> [ vbabka@...e.cz: rename to SLAB_NO_MERGE, change the flag value to the
>   one proposed by David so it does not collide with internal SLAB/SLUB
>   flags, write a comment for the flag, expand changelog, drop the skbuff
>   part to be handled spearately ]
> 
> Reported-by: David Sterba <dsterba@...e.com>
> Signed-off-by: Jesper Dangaard Brouer <brouer@...hat.com>
> Signed-off-by: Vlastimil Babka <vbabka@...e.cz>

Acked-by: Roman Gushchin <roman.gushchin@...ux.dev>
both for this patch and the corresponding change on the networking side.

Thanks!

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ