linux-kernel - Re: [PATCH 1/2] mm: slub: allocate slab object extensions non-contiguously

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <aCzQ_FQUvYgcTX1W@pc636>
Date: Tue, 20 May 2025 20:59:08 +0200
From: Uladzislau Rezki <urezki@...il.com>
To: Kent Overstreet <kent.overstreet@...ux.dev>
Cc: Uladzislau Rezki <urezki@...il.com>,
	Shakeel Butt <shakeel.butt@...ux.dev>,
	Usama Arif <usamaarif642@...il.com>,
	Andrew Morton <akpm@...ux-foundation.org>, surenb@...gle.com,
	hannes@...xchg.org, vlad.wing@...il.com, linux-mm@...ck.org,
	linux-kernel@...r.kernel.org, kernel-team@...a.com
Subject: Re: [PATCH 1/2] mm: slub: allocate slab object extensions
 non-contiguously

On Tue, May 20, 2025 at 01:58:25PM -0400, Kent Overstreet wrote:
> On Tue, May 20, 2025 at 07:57:10PM +0200, Uladzislau Rezki wrote:
> > On Tue, May 20, 2025 at 01:47:54PM -0400, Kent Overstreet wrote:
> > > On Tue, May 20, 2025 at 07:44:49PM +0200, Uladzislau Rezki wrote:
> > > > On Tue, May 20, 2025 at 10:28:06AM -0400, Kent Overstreet wrote:
> > > > > On Tue, May 20, 2025 at 07:24:40AM -0700, Shakeel Butt wrote:
> > > > > > On Tue, May 20, 2025 at 10:01:27AM -0400, Kent Overstreet wrote:
> > > > > > > On Tue, May 20, 2025 at 02:46:14PM +0100, Usama Arif wrote:
> > > > > > > > 
> > > > > > > > 
> > > > > > > > On 20/05/2025 14:44, Kent Overstreet wrote:
> > > > > > > > > On Tue, May 20, 2025 at 01:25:46PM +0100, Usama Arif wrote:
> > > > > > > > >> When memory allocation profiling is running on memory bound services,
> > > > > > > > >> allocations greater than order 0 for slab object extensions can fail,
> > > > > > > > >> for e.g. zs_handle zswap slab which will be 512 objsperslab x 16 bytes
> > > > > > > > >> per slabobj_ext (order 1 allocation). Use kvcalloc to improve chances
> > > > > > > > >> of the allocation being successful.
> > > > > > > > >>
> > > > > > > > >> Signed-off-by: Usama Arif <usamaarif642@...il.com>
> > > > > > > > >> Reported-by: Vlad Poenaru <vlad.wing@...il.com>
> > > > > > > > >> Closes: https://lore.kernel.org/all/17fab2d6-5a74-4573-bcc3-b75951508f0a@gmail.com/
> > > > > > > > >> ---
> > > > > > > > >>  mm/slub.c | 2 +-
> > > > > > > > >>  1 file changed, 1 insertion(+), 1 deletion(-)
> > > > > > > > >>
> > > > > > > > >> diff --git a/mm/slub.c b/mm/slub.c
> > > > > > > > >> index dc9e729e1d26..bf43c403ead2 100644
> > > > > > > > >> --- a/mm/slub.c
> > > > > > > > >> +++ b/mm/slub.c
> > > > > > > > >> @@ -1989,7 +1989,7 @@ int alloc_slab_obj_exts(struct slab *slab, struct kmem_cache *s,
> > > > > > > > >>  	gfp &= ~OBJCGS_CLEAR_MASK;
> > > > > > > > >>  	/* Prevent recursive extension vector allocation */
> > > > > > > > >>  	gfp |= __GFP_NO_OBJ_EXT;
> > > > > > > > >> -	vec = kcalloc_node(objects, sizeof(struct slabobj_ext), gfp,
> > > > > > > > >> +	vec = kvcalloc_node(objects, sizeof(struct slabobj_ext), gfp,
> > > > > > > > >>  			   slab_nid(slab));
> > > > > > > > > 
> > > > > > > > > And what's the latency going to be on a vmalloc() allocation when we're
> > > > > > > > > low on memory?
> > > > > > > > 
> > > > > > > > Would it not be better to get the allocation slighly slower than to not get
> > > > > > > > it at all?
> > > > > > > 
> > > > > > > Our behaviour when thrashing sucks, we don't want to do anything to make
> > > > > > > that worse.
> > > > > > > 
> > > > > > > There's also the fact that vmalloc doesn't correctly respect gfp flags,
> > > > > > > so until that gets fixed this doesn't work at all.
> > > > > > 
> > > > > > Which gfp flags vmalloc is not respecting today?
> > > > > 
> > > > > GFP_NOWAIT.
> > > > > 
> > > > > As to why, you'd better ask Michal Hocko...
> > > > > 
> > > > It is mainly due to pte_alloc_one_kernel(), it uses the GFP_KERNEL
> > > > 
> > > > #define GFP_PGTABLE_KERNEL	(GFP_KERNEL | __GFP_ZERO)
> > > > 
> > > > to get a new pte entry.
> > > > 
> > > > I think we can fix it. For example if we populate some region and allocate
> > > > there for NOWAIT. But there are of course can be other hidden problems.
> > > 
> > > No, PF_MEMALLOC flags allow for passing most of gfp flags for pte
> > > allocation.
> > >
> > It is hard-coded:
> > 
> > static inline pte_t *__pte_alloc_one_kernel_noprof(struct mm_struct *mm)
> > {
> > 	struct ptdesc *ptdesc = pagetable_alloc_noprof(GFP_PGTABLE_KERNEL &
> > 			~__GFP_HIGHMEM, 0);
> > 
> > 	if (!ptdesc)
> > 		return NULL;
> > 	return ptdesc_address(ptdesc);
> > }
> 
> I suggest you read the code around PF_MEMALLOC flags.
>
To wrap the allocation context by the PF_MEMALLOC to prevent entering into
direct reclaim and no sleeping, looks like another approach, i can think about.

One concern is depleting of memory reserves. Populating PTEs would not require
this but i tend to say it is ugly approach which i mentioned above.

--
Uladzislau Rezki