lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <zex6wgdlxk5vgwm7ou657fdmi27xnxihdndlszfa2chghamfuz@grhtfqw7gm7o>
Date: Wed, 24 Dec 2025 00:08:36 +0800
From: Hao Li <hao.li@...ux.dev>
To: Harry Yoo <harry.yoo@...cle.com>
Cc: akpm@...ux-foundation.org, vbabka@...e.cz, andreyknvl@...il.com, 
	cl@...two.org, dvyukov@...gle.com, glider@...gle.com, hannes@...xchg.org, 
	linux-mm@...ck.org, mhocko@...nel.org, muchun.song@...ux.dev, rientjes@...gle.com, 
	roman.gushchin@...ux.dev, ryabinin.a.a@...il.com, shakeel.butt@...ux.dev, 
	surenb@...gle.com, vincenzo.frascino@....com, yeoreum.yun@....com, tytso@....edu, 
	adilger.kernel@...ger.ca, linux-ext4@...r.kernel.org, linux-kernel@...r.kernel.org, 
	cgroups@...r.kernel.org
Subject: Re: [PATCH V4 7/8] mm/slab: save memory by allocating slabobj_ext
 array from leftover

On Wed, Dec 24, 2025 at 12:31:19AM +0900, Harry Yoo wrote:
> On Tue, Dec 23, 2025 at 11:08:32PM +0800, Hao Li wrote:
> > On Mon, Dec 22, 2025 at 08:08:42PM +0900, Harry Yoo wrote:
> > > The leftover space in a slab is always smaller than s->size, and
> > > kmem caches for large objects that are not power-of-two sizes tend to have
> > > a greater amount of leftover space per slab. In some cases, the leftover
> > > space is larger than the size of the slabobj_ext array for the slab.
> > > 
> > > An excellent example of such a cache is ext4_inode_cache. On my system,
> > > the object size is 1144, with a preferred order of 3, 28 objects per slab,
> > > and 736 bytes of leftover space per slab.
> > > 
> > > Since the size of the slabobj_ext array is only 224 bytes (w/o mem
> > > profiling) or 448 bytes (w/ mem profiling) per slab, the entire array
> > > fits within the leftover space.
> > > 
> > > Allocate the slabobj_exts array from this unused space instead of using
> > > kcalloc() when it is large enough. The array is allocated from unused
> > > space only when creating new slabs, and it doesn't try to utilize unused
> > > space if alloc_slab_obj_exts() is called after slab creation because
> > > implementing lazy allocation involves more expensive synchronization.
> > > 
> > > The implementation and evaluation of lazy allocation from unused space
> > > is left as future-work. As pointed by Vlastimil Babka [1], it could be
> > > beneficial when a slab cache without SLAB_ACCOUNT can be created, and
> > > some of the allocations from the cache use __GFP_ACCOUNT. For example,
> > > xarray does that.
> > > 
> > > To avoid unnecessary overhead when MEMCG (with SLAB_ACCOUNT) and
> > > MEM_ALLOC_PROFILING are not used for the cache, allocate the slabobj_ext
> > > array only when either of them is enabled.
> > > 
> > > [ MEMCG=y, MEM_ALLOC_PROFILING=n ]
> > > 
> > > Before patch (creating ~2.64M directories on ext4):
> > >   Slab:            4747880 kB
> > >   SReclaimable:    4169652 kB
> > >   SUnreclaim:       578228 kB
> > > 
> > > After patch (creating ~2.64M directories on ext4):
> > >   Slab:            4724020 kB
> > >   SReclaimable:    4169188 kB
> > >   SUnreclaim:       554832 kB (-22.84 MiB)
> > > 
> > > Enjoy the memory savings!
> > > 
> > > Link: https://lore.kernel.org/linux-mm/48029aab-20ea-4d90-bfd1-255592b2018e@suse.cz
> > > Signed-off-by: Harry Yoo <harry.yoo@...cle.com>
> > > ---
> > >  mm/slub.c | 156 ++++++++++++++++++++++++++++++++++++++++++++++++++++--
> > >  1 file changed, 151 insertions(+), 5 deletions(-)
> > > 
> > > diff --git a/mm/slub.c b/mm/slub.c
> > > index 39c381cc1b2c..3fc3d2ca42e7 100644
> > > --- a/mm/slub.c
> > > +++ b/mm/slub.c
> > > @@ -886,6 +886,99 @@ static inline unsigned long get_orig_size(struct kmem_cache *s, void *object)
> > >  	return *(unsigned long *)p;
> > >  }
> > >  
> > > +#ifdef CONFIG_SLAB_OBJ_EXT
> > > +
> > > +/*
> > > + * Check if memory cgroup or memory allocation profiling is enabled.
> > > + * If enabled, SLUB tries to reduce memory overhead of accounting
> > > + * slab objects. If neither is enabled when this function is called,
> > > + * the optimization is simply skipped to avoid affecting caches that do not
> > > + * need slabobj_ext metadata.
> > > + *
> > > + * However, this may disable optimization when memory cgroup or memory
> > > + * allocation profiling is used, but slabs are created too early
> > > + * even before those subsystems are initialized.
> > > + */
> > > +static inline bool need_slab_obj_exts(struct kmem_cache *s)
> > > +{
> > > +	if (memcg_kmem_online() && (s->flags & SLAB_ACCOUNT))
> > > +		return true;
> > > +
> > > +	if (mem_alloc_profiling_enabled())
> > > +		return true;
> > > +
> > > +	return false;
> > > +}
> > > +
> > > +static inline unsigned int obj_exts_size_in_slab(struct slab *slab)
> > > +{
> > > +	return sizeof(struct slabobj_ext) * slab->objects;
> > > +}
> > > +
> > > +static inline unsigned long obj_exts_offset_in_slab(struct kmem_cache *s,
> > > +						    struct slab *slab)
> > > +{
> > > +	unsigned long objext_offset;
> > > +
> > > +	objext_offset = s->red_left_pad + s->size * slab->objects;
> > 
> > Hi Harry,
> 
> Hi Hao, thanks for the review!
> Hope you're doing well.

Thanks Harry. Hope you are too!

> 
> > As s->size already includes s->red_left_pad
> 
> Great question. It's true that s->size includes s->red_left_pad,
> but we have also a redzone right before the first object:
> 
>   [ redzone ] [ obj 1 | redzone ] [ obj 2| redzone ] [ ... ]
> 
> So we have (slab->objects + 1) red zones and so

I have a follow-up question regarding the redzones. Unless I'm missing
some detail, it seems the left redzone should apply to each object as
well. If so, I would expect the memory layout to be:

[left redzone | obj 1 | right redzone], [left redzone | obj 2 | right redzone], [ ... ]

In `calculate_sizes()`, I see:

if ((flags & SLAB_RED_ZONE) && size == s->object_size)
    size += sizeof(void *);
...
...
if (flags & SLAB_RED_ZONE) {
    size += s->red_left_pad;
}

Could you please confirm whether my understanding is correct, or point
out what I'm missing?

> 
> > do we still need > s->red_left_pad here?
> 
> I think this is still needed.
> 
> -- 
> Cheers,
> Harry / Hyeonggon

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ