[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <202504030920.EB65CCA2@keescook>
Date: Thu, 3 Apr 2025 09:21:50 -0700
From: Kees Cook <kees@...nel.org>
To: Michal Hocko <mhocko@...e.com>
Cc: Shakeel Butt <shakeel.butt@...ux.dev>,
Dave Chinner <david@...morbit.com>,
Yafang Shao <laoar.shao@...il.com>,
Harry Yoo <harry.yoo@...cle.com>, joel.granados@...nel.org,
linux-fsdevel@...r.kernel.org, linux-kernel@...r.kernel.org,
Josef Bacik <josef@...icpanda.com>, linux-mm@...ck.org,
Vlastimil Babka <vbabka@...e.cz>
Subject: Re: [PATCH] mm: kvmalloc: make kmalloc fast path real fast path
On Thu, Apr 03, 2025 at 09:43:39AM +0200, Michal Hocko wrote:
> There are users like xfs which need larger allocations with NOFAIL
> sementic. They are not using kvmalloc currently because the current
> implementation tries too hard to allocate through the kmalloc path
> which causes a lot of direct reclaim and compaction and that hurts
> performance a lot (see 8dc9384b7d75 ("xfs: reduce kvmalloc overhead for
> CIL shadow buffers") for more details).
>
> kvmalloc does support __GFP_RETRY_MAYFAIL semantic to express that
> kmalloc (physically contiguous) allocation is preferred and we should go
> more aggressive to make it happen. There is currently no way to express
> that kmalloc should be very lightweight and as it has been argued [1]
> this mode should be default to support kvmalloc(NOFAIL) with a
> lightweight kmalloc path which is currently impossible to express as
> __GFP_NOFAIL cannot be combined by any other reclaim modifiers.
>
> This patch makes all kmalloc allocations GFP_NOWAIT unless
> __GFP_RETRY_MAYFAIL is provided to kvmalloc. This allows to support both
> fail fast and retry hard on physically contiguous memory with vmalloc
> fallback.
>
> There is a potential downside that relatively small allocations (smaller
> than PAGE_ALLOC_COSTLY_ORDER) could fallback to vmalloc too easily and
> cause page block fragmentation. We cannot really rule that out but it
> seems that xlog_cil_kvmalloc use doesn't indicate this to be happening.
>
> [1] https://lore.kernel.org/all/Z-3i1wATGh6vI8x8@dread.disaster.area/T/#u
> Signed-off-by: Michal Hocko <mhocko@...e.com>
Thanks for finding a solution for this! It makes way more sense to me to
kick over to vmap by default for kvmalloc users.
> ---
> mm/slub.c | 8 +++++---
> 1 file changed, 5 insertions(+), 3 deletions(-)
>
> diff --git a/mm/slub.c b/mm/slub.c
> index b46f87662e71..2da40c2f6478 100644
> --- a/mm/slub.c
> +++ b/mm/slub.c
> @@ -4972,14 +4972,16 @@ static gfp_t kmalloc_gfp_adjust(gfp_t flags, size_t size)
> * We want to attempt a large physically contiguous block first because
> * it is less likely to fragment multiple larger blocks and therefore
> * contribute to a long term fragmentation less than vmalloc fallback.
> - * However make sure that larger requests are not too disruptive - no
> - * OOM killer and no allocation failure warnings as we have a fallback.
> + * However make sure that larger requests are not too disruptive - i.e.
> + * do not direct reclaim unless physically continuous memory is preferred
> + * (__GFP_RETRY_MAYFAIL mode). We still kick in kswapd/kcompactd to start
> + * working in the background but the allocation itself.
I think a word is missing here? "...but do the allocation..." or
"...allocation itself happens" ?
--
Kees Cook
Powered by blists - more mailing lists