lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Tue, 7 Mar 2023 14:20:20 +0000
From:   Hyeonggon Yoo <42.hyeyoo@...il.com>
To:     Chen Jun <chenjun102@...wei.com>
Cc:     linux-kernel@...r.kernel.org, linux-mm@...ck.org, cl@...ux.com,
        penberg@...nel.org, rientjes@...gle.com, iamjoonsoo.kim@....com,
        akpm@...ux-foundation.org, vbabka@...e.cz, xuqiang36@...wei.com
Subject: Re: [RFC] mm/slub: Reduce memory consumption in extreme scenarios

On Tue, Mar 07, 2023 at 08:28:11AM +0000, Chen Jun wrote:
> If call kmalloc_node with NO __GFP_THISNODE and node[A] with no memory.
> Slub will alloc a slub page which is not belong to A, and put the page
> to kmem_cache_node[page_to_nid(page)]. The page can not be reused
> at next calling, because NULL will be get from get_partical().
> That make kmalloc_node consume more memory.

Hello,

elaborating a little bit:

"When kmalloc_node() is called without __GFP_THISNODE and the target node
lacks sufficient memory, SLUB allocates a folio from a different node other
than the requested node, instead of taking a partial slab from it.

However, since the allocated folio does not belong to the requested node,
it is deactivated and added to the partial slab list of the node it
belongs to.

This behavior can result in excessive memory usage when the requested
node has insufficient memory, as SLUB will repeatedly allocate folios from
other nodes without reusing the previously allocated ones.

To prevent memory wastage, take a partial slab from a different node when
the requested node has no partial slab and __GFP_THISNODE is not explicitly
specified."

> On qemu with 4 numas and each numa has 1G memory, Write a test ko
> to call kmalloc_node(196, 0xd20c0, 3) for 5 * 1024 * 1024 times.
> 
> cat /proc/slabinfo shows:
> kmalloc-256       4302317 15151808    256   32    2 : tunables..
> 
> the total objects is much more then active objects.
> 
> After this patch, cat /prac/slubinfo shows:
> kmalloc-256       5244950 5245088    256   32    2 : tunables..
>
> Signed-off-by: Chen Jun <chenjun102@...wei.com>
> ---
>  mm/slub.c | 17 ++++++++++++++---
>  1 file changed, 14 insertions(+), 3 deletions(-)
> 
> diff --git a/mm/slub.c b/mm/slub.c
> index 39327e98fce3..c0090a5de54e 100644
> --- a/mm/slub.c
> +++ b/mm/slub.c
> @@ -2384,7 +2384,7 @@ static void *get_partial(struct kmem_cache *s, int node, struct partial_context
>  		searchnode = numa_mem_id();
>  
>  	object = get_partial_node(s, get_node(s, searchnode), pc);
> -	if (object || node != NUMA_NO_NODE)
> +	if (object || (node != NUMA_NO_NODE && (pc->flags & __GFP_THISNODE)))
>  		return object;

I think the problem here is to avoid taking a partial slab from
different node than the requested node even if __GFP_THISNODE is not set.
(and then allocating new slab instead)

Thus this hunk makes sense to me,
even if SLUB currently do not implement __GFP_THISNODE semantics.

>  	return get_any_partial(s, pc);
> @@ -3069,6 +3069,7 @@ static void *___slab_alloc(struct kmem_cache *s, gfp_t gfpflags, int node,
>  	struct slab *slab;
>  	unsigned long flags;
>  	struct partial_context pc;
> +	int try_thisndoe = 0;
>
>  
>  	stat(s, ALLOC_SLOWPATH);
>  
> @@ -3181,8 +3182,12 @@ static void *___slab_alloc(struct kmem_cache *s, gfp_t gfpflags, int node,
>  	}
>  
>  new_objects:
> -
>  	pc.flags = gfpflags;
> +
> +	/* Try to get page from specific node even if __GFP_THISNODE is not set */
> +	if (node != NUMA_NO_NODE && !(gfpflags & __GFP_THISNODE) && try_thisnode)
> +			pc.flags |= __GFP_THISNODE;
> +
>  	pc.slab = &slab;
>  	pc.orig_size = orig_size;
>  	freelist = get_partial(s, node, &pc);
> @@ -3190,10 +3195,16 @@ static void *___slab_alloc(struct kmem_cache *s, gfp_t gfpflags, int node,
>  		goto check_new_slab;
>  
>  	slub_put_cpu_ptr(s->cpu_slab);
> -	slab = new_slab(s, gfpflags, node);
> +	slab = new_slab(s, pc.flags, node);
>  	c = slub_get_cpu_ptr(s->cpu_slab);
>  
>  	if (unlikely(!slab)) {
> +		/* Try to get page from any other node */
> +		if (node != NUMA_NO_NODE && !(gfpflags & __GFP_THISNODE) && try_thisnode) {
> +			try_thisnode = 0;
> +			goto new_objects;
> +		}
> +
>  		slab_out_of_memory(s, gfpflags, node);
>  		return NULL;

But these hunks do not make sense to me.
Why force __GFP_THISNODE even when the caller did not specify it?

(Apart from the fact that try_thisnode is defined as try_thisndoe,
 and try_thisnode is never set to nonzero value.)

IMHO the first hunk is enough to solve the problem.

Thanks,
Hyeonggon

>  	}
> -- 
> 2.17.1
> 
> 

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ