lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <aO8behuGn5jVo28K@casper.infradead.org>
Date: Wed, 15 Oct 2025 04:56:42 +0100
From: Matthew Wilcox <willy@...radead.org>
To: "Vishal Moola (Oracle)" <vishal.moola@...il.com>
Cc: linux-mm@...ck.org, linux-kernel@...r.kernel.org,
	Uladzislau Rezki <urezki@...il.com>,
	Andrew Morton <akpm@...ux-foundation.org>
Subject: Re: [RFC PATCH] mm/vmalloc: request large order pages from buddy
 allocator

On Tue, Oct 14, 2025 at 11:27:54AM -0700, Vishal Moola (Oracle) wrote:
> Running 1000 iterations of allocations on a small 4GB system finds:
> 
> 1000 2mb allocations:
> 	[Baseline]			[This patch]
> 	real    46.310s			real    34.380s
> 	user    0.001s			user    0.008s
> 	sys     46.058s			sys     34.152s
> 
> 10000 200kb allocations:
> 	[Baseline]			[This patch]
> 	real    56.104s			real    43.946s
> 	user    0.001s			user    0.003s
> 	sys     55.375s			sys     43.259s
> 
> 10000 20kb allocations:
> 	[Baseline]			[This patch]
> 	real    0m8.438s		real    0m9.160s
> 	user    0m0.001s		user    0m0.002s
> 	sys     0m7.936s		sys     0m8.671s

I'd be more confident in the 20kB numbers if you'd done 10x more
iterations.

Also, I think 20kB is probably an _interesting_ number, but it's not
going to display your change to its best advantage.  A 32kB allocation
will look much better, for example.

Also, can you go into more detail of the test?  Based on our off-list
conversation, we were talking about allocating something like 100MB
of memory (in these various sizes) then freeing it, just to be sure
that we're measuring the performance of the buddy allocator and
not the PCP list.

> This is an RFC, comments and thoughts are welcomed. There is a
> clear benefit to be had for large allocations, but there is
> some regression for smaller allocations.

Also we think that there's probably a later win to be had by
not splitting the page we allocated.

At some point, we should also start allocating frozen pages
for vmalloc.  That's going to be interesting for the users which
map vmalloc pages to userspace.

> diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> index 97cef2cc14d3..0a25e5cf841c 100644
> --- a/mm/vmalloc.c
> +++ b/mm/vmalloc.c
> @@ -3621,6 +3621,38 @@ vm_area_alloc_pages(gfp_t gfp, int nid,
>  	unsigned int nr_allocated = 0;
>  	struct page *page;
>  	int i;
> +	gfp_t large_gfp = (gfp & ~__GFP_DIRECT_RECLAIM) | __GFP_NOWARN;
> +	unsigned int large_order = ilog2(nr_pages - nr_allocated);
> +
> +	/*
> +	 * Initially, attempt to have the page allocator give us large order
> +	 * pages. Do not attempt allocating smaller than order chunks since
> +	 * __vmap_pages_range() expects physically contigous pages of exactly
> +	 * order long chunks.
> +	 */
> +	while (large_order > order && nr_allocated < nr_pages) {
> +		/*
> +		 * High-order nofail allocations are really expensive and
> +		 * potentially dangerous (pre-mature OOM, disruptive reclaim
> +		 * and compaction etc.
> +		 */
> +		if (gfp & __GFP_NOFAIL)
> +			break;

sure, but we could just clear NOFAIL from the large_gfp flags instead
of giving up on this path so quickly?

> +		if (nid == NUMA_NO_NODE)
> +			page = alloc_pages_noprof(large_gfp, large_order);
> +		else
> +			page = alloc_pages_node_noprof(nid, large_gfp, large_order);
> +
> +		if (unlikely(!page))
> +			break;

I'm not entirely convinced here.  We might want to fall back to the next
larger size.  eg if we try to allocate an order-6 page, and there's not
one readily available, perhaps we should try to allocate an order-5 page
instead of falling back to the bulk allocator?

> @@ -3665,7 +3697,7 @@ vm_area_alloc_pages(gfp_t gfp, int nid,
>  		}
>  	}
>  
> -	/* High-order pages or fallback path if "bulk" fails. */
> +	/* High-order arch pages or fallback path if "bulk" fails. */

I'm not quite clear what this comment change is meant to convey?

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ