[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <0f2091ba-0a43-4dd3-aa48-fe284530044a@suse.cz>
Date: Wed, 9 Apr 2025 11:11:37 +0200
From: Vlastimil Babka <vbabka@...e.cz>
To: Michal Hocko <mhocko@...e.com>, Dave Chinner <david@...morbit.com>,
Andrew Morton <akpm@...ux-foundation.org>
Cc: Shakeel Butt <shakeel.butt@...ux.dev>, Yafang Shao
<laoar.shao@...il.com>, Harry Yoo <harry.yoo@...cle.com>,
Kees Cook <kees@...nel.org>, joel.granados@...nel.org,
linux-fsdevel@...r.kernel.org, linux-kernel@...r.kernel.org,
Josef Bacik <josef@...icpanda.com>, linux-mm@...ck.org
Subject: Re: [PATCH] mm: kvmalloc: make kmalloc fast path real fast path
On 4/9/25 9:35 AM, Michal Hocko wrote:
> On Thu 03-04-25 21:51:46, Michal Hocko wrote:
>> Add Andrew
>
> Andrew, do you want me to repost the patch or can you take it from this
> email thread?
I'll take it as it's now all in mm/slub.c
>> Also, Dave do you want me to redirect xlog_cil_kvmalloc to kvmalloc or
>> do you preffer to do that yourself?
>>
>> On Thu 03-04-25 09:43:41, Michal Hocko wrote:
>>> There are users like xfs which need larger allocations with NOFAIL
>>> sementic. They are not using kvmalloc currently because the current
>>> implementation tries too hard to allocate through the kmalloc path
>>> which causes a lot of direct reclaim and compaction and that hurts
>>> performance a lot (see 8dc9384b7d75 ("xfs: reduce kvmalloc overhead for
>>> CIL shadow buffers") for more details).
>>>
>>> kvmalloc does support __GFP_RETRY_MAYFAIL semantic to express that
>>> kmalloc (physically contiguous) allocation is preferred and we should go
>>> more aggressive to make it happen. There is currently no way to express
>>> that kmalloc should be very lightweight and as it has been argued [1]
>>> this mode should be default to support kvmalloc(NOFAIL) with a
>>> lightweight kmalloc path which is currently impossible to express as
>>> __GFP_NOFAIL cannot be combined by any other reclaim modifiers.
>>>
>>> This patch makes all kmalloc allocations GFP_NOWAIT unless
>>> __GFP_RETRY_MAYFAIL is provided to kvmalloc. This allows to support both
>>> fail fast and retry hard on physically contiguous memory with vmalloc
>>> fallback.
>>>
>>> There is a potential downside that relatively small allocations (smaller
>>> than PAGE_ALLOC_COSTLY_ORDER) could fallback to vmalloc too easily and
>>> cause page block fragmentation. We cannot really rule that out but it
>>> seems that xlog_cil_kvmalloc use doesn't indicate this to be happening.
>>>
>>> [1] https://lore.kernel.org/all/Z-3i1wATGh6vI8x8@dread.disaster.area/T/#u
>>> Signed-off-by: Michal Hocko <mhocko@...e.com>
>>> ---
>>> mm/slub.c | 8 +++++---
>>> 1 file changed, 5 insertions(+), 3 deletions(-)
>>>
>>> diff --git a/mm/slub.c b/mm/slub.c
>>> index b46f87662e71..2da40c2f6478 100644
>>> --- a/mm/slub.c
>>> +++ b/mm/slub.c
>>> @@ -4972,14 +4972,16 @@ static gfp_t kmalloc_gfp_adjust(gfp_t flags, size_t size)
>>> * We want to attempt a large physically contiguous block first because
>>> * it is less likely to fragment multiple larger blocks and therefore
>>> * contribute to a long term fragmentation less than vmalloc fallback.
>>> - * However make sure that larger requests are not too disruptive - no
>>> - * OOM killer and no allocation failure warnings as we have a fallback.
>>> + * However make sure that larger requests are not too disruptive - i.e.
>>> + * do not direct reclaim unless physically continuous memory is preferred
>>> + * (__GFP_RETRY_MAYFAIL mode). We still kick in kswapd/kcompactd to start
>>> + * working in the background but the allocation itself.
>>> */
>>> if (size > PAGE_SIZE) {
>>> flags |= __GFP_NOWARN;
>>>
>>> if (!(flags & __GFP_RETRY_MAYFAIL))
>>> - flags |= __GFP_NORETRY;
>>> + flags &= ~__GFP_DIRECT_RECLAIM;
>>>
>>> /* nofail semantic is implemented by the vmalloc fallback */
>>> flags &= ~__GFP_NOFAIL;
>>> --
>>> 2.49.0
>>>
>>
>> --
>> Michal Hocko
>> SUSE Labs
>
Powered by blists - more mailing lists