[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <aCzQ_FQUvYgcTX1W@pc636>
Date: Tue, 20 May 2025 20:59:08 +0200
From: Uladzislau Rezki <urezki@...il.com>
To: Kent Overstreet <kent.overstreet@...ux.dev>
Cc: Uladzislau Rezki <urezki@...il.com>,
Shakeel Butt <shakeel.butt@...ux.dev>,
Usama Arif <usamaarif642@...il.com>,
Andrew Morton <akpm@...ux-foundation.org>, surenb@...gle.com,
hannes@...xchg.org, vlad.wing@...il.com, linux-mm@...ck.org,
linux-kernel@...r.kernel.org, kernel-team@...a.com
Subject: Re: [PATCH 1/2] mm: slub: allocate slab object extensions
non-contiguously
On Tue, May 20, 2025 at 01:58:25PM -0400, Kent Overstreet wrote:
> On Tue, May 20, 2025 at 07:57:10PM +0200, Uladzislau Rezki wrote:
> > On Tue, May 20, 2025 at 01:47:54PM -0400, Kent Overstreet wrote:
> > > On Tue, May 20, 2025 at 07:44:49PM +0200, Uladzislau Rezki wrote:
> > > > On Tue, May 20, 2025 at 10:28:06AM -0400, Kent Overstreet wrote:
> > > > > On Tue, May 20, 2025 at 07:24:40AM -0700, Shakeel Butt wrote:
> > > > > > On Tue, May 20, 2025 at 10:01:27AM -0400, Kent Overstreet wrote:
> > > > > > > On Tue, May 20, 2025 at 02:46:14PM +0100, Usama Arif wrote:
> > > > > > > >
> > > > > > > >
> > > > > > > > On 20/05/2025 14:44, Kent Overstreet wrote:
> > > > > > > > > On Tue, May 20, 2025 at 01:25:46PM +0100, Usama Arif wrote:
> > > > > > > > >> When memory allocation profiling is running on memory bound services,
> > > > > > > > >> allocations greater than order 0 for slab object extensions can fail,
> > > > > > > > >> for e.g. zs_handle zswap slab which will be 512 objsperslab x 16 bytes
> > > > > > > > >> per slabobj_ext (order 1 allocation). Use kvcalloc to improve chances
> > > > > > > > >> of the allocation being successful.
> > > > > > > > >>
> > > > > > > > >> Signed-off-by: Usama Arif <usamaarif642@...il.com>
> > > > > > > > >> Reported-by: Vlad Poenaru <vlad.wing@...il.com>
> > > > > > > > >> Closes: https://lore.kernel.org/all/17fab2d6-5a74-4573-bcc3-b75951508f0a@gmail.com/
> > > > > > > > >> ---
> > > > > > > > >> mm/slub.c | 2 +-
> > > > > > > > >> 1 file changed, 1 insertion(+), 1 deletion(-)
> > > > > > > > >>
> > > > > > > > >> diff --git a/mm/slub.c b/mm/slub.c
> > > > > > > > >> index dc9e729e1d26..bf43c403ead2 100644
> > > > > > > > >> --- a/mm/slub.c
> > > > > > > > >> +++ b/mm/slub.c
> > > > > > > > >> @@ -1989,7 +1989,7 @@ int alloc_slab_obj_exts(struct slab *slab, struct kmem_cache *s,
> > > > > > > > >> gfp &= ~OBJCGS_CLEAR_MASK;
> > > > > > > > >> /* Prevent recursive extension vector allocation */
> > > > > > > > >> gfp |= __GFP_NO_OBJ_EXT;
> > > > > > > > >> - vec = kcalloc_node(objects, sizeof(struct slabobj_ext), gfp,
> > > > > > > > >> + vec = kvcalloc_node(objects, sizeof(struct slabobj_ext), gfp,
> > > > > > > > >> slab_nid(slab));
> > > > > > > > >
> > > > > > > > > And what's the latency going to be on a vmalloc() allocation when we're
> > > > > > > > > low on memory?
> > > > > > > >
> > > > > > > > Would it not be better to get the allocation slighly slower than to not get
> > > > > > > > it at all?
> > > > > > >
> > > > > > > Our behaviour when thrashing sucks, we don't want to do anything to make
> > > > > > > that worse.
> > > > > > >
> > > > > > > There's also the fact that vmalloc doesn't correctly respect gfp flags,
> > > > > > > so until that gets fixed this doesn't work at all.
> > > > > >
> > > > > > Which gfp flags vmalloc is not respecting today?
> > > > >
> > > > > GFP_NOWAIT.
> > > > >
> > > > > As to why, you'd better ask Michal Hocko...
> > > > >
> > > > It is mainly due to pte_alloc_one_kernel(), it uses the GFP_KERNEL
> > > >
> > > > #define GFP_PGTABLE_KERNEL (GFP_KERNEL | __GFP_ZERO)
> > > >
> > > > to get a new pte entry.
> > > >
> > > > I think we can fix it. For example if we populate some region and allocate
> > > > there for NOWAIT. But there are of course can be other hidden problems.
> > >
> > > No, PF_MEMALLOC flags allow for passing most of gfp flags for pte
> > > allocation.
> > >
> > It is hard-coded:
> >
> > static inline pte_t *__pte_alloc_one_kernel_noprof(struct mm_struct *mm)
> > {
> > struct ptdesc *ptdesc = pagetable_alloc_noprof(GFP_PGTABLE_KERNEL &
> > ~__GFP_HIGHMEM, 0);
> >
> > if (!ptdesc)
> > return NULL;
> > return ptdesc_address(ptdesc);
> > }
>
> I suggest you read the code around PF_MEMALLOC flags.
>
To wrap the allocation context by the PF_MEMALLOC to prevent entering into
direct reclaim and no sleeping, looks like another approach, i can think about.
One concern is depleting of memory reserves. Populating PTEs would not require
this but i tend to say it is ugly approach which i mentioned above.
--
Uladzislau Rezki
Powered by blists - more mailing lists