linux-kernel - Re: [PATCH] mm/vmalloc: request large order pages from buddy allocator

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <aPjrRkjiIt6HmXmT@casper.infradead.org>
Date: Wed, 22 Oct 2025 15:33:42 +0100
From: Matthew Wilcox <willy@...radead.org>
To: Andrew Morton <akpm@...ux-foundation.org>
Cc: "Vishal Moola (Oracle)" <vishal.moola@...il.com>, linux-mm@...ck.org,
	linux-kernel@...r.kernel.org, Uladzislau Rezki <urezki@...il.com>
Subject: Re: [PATCH] mm/vmalloc: request large order pages from buddy
 allocator

On Tue, Oct 21, 2025 at 02:24:36PM -0700, Andrew Morton wrote:
> On Tue, 21 Oct 2025 12:44:56 -0700 "Vishal Moola (Oracle)" <vishal.moola@...il.com> wrote:
> 
> > Sometimes, vm_area_alloc_pages() will want many pages from the buddy
> > allocator. Rather than making requests to the buddy allocator for at
> > most 100 pages at a time, we can eagerly request large order pages a
> > smaller number of times.
> 
> Does this have potential to inadvertently reduce the availability of
> hugepages?

Quite the opposite.  Let's say we're doing a 40KiB allocation.  If we
just take the first 10 pages off the PCP list, those could be from
ten different 2MB chunks, preventing ten different hugepages from
forming until the allocation succeeds.  If instead we do an order-3
allocation and an order-1 allocation, those can be from at most two
different 2MB chunks and prevent at most two hugepages from forming.

> > 1000 2mb allocations:
> > 	[Baseline]			[This patch]
> > 	real    46.310s			real    0m34.582
> > 	user    0.001s			user    0.006s
> > 	sys     46.058s			sys     0m34.365s
> > 
> > 10000 200kb allocations:
> > 	[Baseline]			[This patch]
> > 	real    56.104s			real    0m43.696
> > 	user    0.001s			user    0.003s
> > 	sys     55.375s			sys     0m42.995s
> 
> Nice, but how significant is this change likely to be for a real workload?

Ulad has numbers for the last iteration of this patch showing an
improvement for a 16KiB allocation, which is an improvement for fork()
now we all have VMAP_STACK.

> > +	gfp_t large_gfp = (gfp &
> > +		~(__GFP_DIRECT_RECLAIM | __GFP_NOFAIL | __GFP_COMP))
> > +		| __GFP_NOWARN;
> 
> Gee, why is this so complicated?

Because GFP flags suck as an interface?  Look at kmalloc_gfp_adjust().

> > +	unsigned int large_order = ilog2(nr_remaining);
> 
> Should nr_remaining be rounded up to next-power-of-two?

No, we don't want to overallocate, we want to precisely allocate.
To use our 40KiB example from earlier, we want to satisfy the allocation
by allocating a 32KiB chunk and an 8KiB chunk, not by allocating 64KiB
and only using part of it.

(I suppose there's an argument for using alloc_pages_exact() here, but
I think it's a fairly weak one)