linux-kernel - Re: [PATCH] mm/vmalloc: request large order pages from buddy allocator

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <d108a8ce-8919-459d-aeca-dfa75cab54e7@arm.com>
Date: Thu, 11 Dec 2025 15:28:56 +0000
From: Ryan Roberts <ryan.roberts@....com>
To: "Vishal Moola (Oracle)" <vishal.moola@...il.com>
Cc: linux-mm@...ck.org, linux-kernel@...r.kernel.org,
 Uladzislau Rezki <urezki@...il.com>,
 Andrew Morton <akpm@...ux-foundation.org>
Subject: Re: [PATCH] mm/vmalloc: request large order pages from buddy
 allocator

On 10/12/2025 22:28, Vishal Moola (Oracle) wrote:
> On Wed, Dec 10, 2025 at 01:21:22PM +0000, Ryan Roberts wrote:
>> Hi Vishal,
>>
>>
>> On 21/10/2025 20:44, Vishal Moola (Oracle) wrote:
>>> Sometimes, vm_area_alloc_pages() will want many pages from the buddy
>>> allocator. Rather than making requests to the buddy allocator for at
>>> most 100 pages at a time, we can eagerly request large order pages a
>>> smaller number of times.
>>>
>>> We still split the large order pages down to order-0 as the rest of the
>>> vmalloc code (and some callers) depend on it. We still defer to the bulk
>>> allocator and fallback path in case of order-0 pages or failure.
>>>
>>> Running 1000 iterations of allocations on a small 4GB system finds:
>>>
>>> 1000 2mb allocations:
>>> 	[Baseline]			[This patch]
>>> 	real    46.310s			real    0m34.582
>>> 	user    0.001s			user    0.006s
>>> 	sys     46.058s			sys     0m34.365s
>>>
>>> 10000 200kb allocations:
>>> 	[Baseline]			[This patch]
>>> 	real    56.104s			real    0m43.696
>>> 	user    0.001s			user    0.003s
>>> 	sys     55.375s			sys     0m42.995s
>>
>> I'm seeing some big vmalloc micro benchmark regressions on arm64, for which 
>> bisect is pointing to this patch.
> 
> Ulad had similar findings/concerns[1]. Tldr: The numbers you are seeing
> are expected for how the test module is currently written.

Hmm... simplistically, I'd say that either the tests are bad, in which case they
should be deleted, or they are good, in which case we shouldn't ignore the
regressions. Having tests that we learn to ignore is the worst of both worlds.

But I see your point about the allocation pattern not being very realistic.

> 
>> The tests are all originally from the vmalloc_test module. Note that (R) 
>> indicates a statistically significant regression and (I) indicates a 
>> statistically improvement.
>>
>> p is number of pages in the allocation, h is huge. So it looks like the 
>> regressions are all coming for the non-huge case, where we want to split to 
>> order-0.
>>
>> +---------------------------------+----------------------------------------------------------+------------+------------------------+
>> | Benchmark                       | Result Class                                             |     6-18-0 |   6-18-0-gc2f2b01b74be |
>> +=================================+==========================================================+============+========================+
>> | micromm/vmalloc                 | fix_align_alloc_test: p:1, h:0, l:500000 (usec)          |  514126.58 |            (R) -42.20% |
>> |                                 | fix_size_alloc_test: p:1, h:0, l:500000 (usec)           |  320458.33 |                 -0.02% |
>> |                                 | fix_size_alloc_test: p:4, h:0, l:500000 (usec)           |  399680.33 |            (R) -23.43% |
>> |                                 | fix_size_alloc_test: p:16, h:0, l:500000 (usec)          |  788723.25 |            (R) -23.66% |
>> |                                 | fix_size_alloc_test: p:16, h:1, l:500000 (usec)          |  979839.58 |                 -1.05% |
>> |                                 | fix_size_alloc_test: p:64, h:0, l:100000 (usec)          |  481454.58 |            (R) -23.99% |
>> |                                 | fix_size_alloc_test: p:64, h:1, l:100000 (usec)          |  615924.00 |              (I) 2.56% |
>> |                                 | fix_size_alloc_test: p:256, h:0, l:100000 (usec)         | 1799224.08 |            (R) -23.28% |
>> |                                 | fix_size_alloc_test: p:256, h:1, l:100000 (usec)         | 2313859.25 |              (I) 3.43% |
>> |                                 | fix_size_alloc_test: p:512, h:0, l:100000 (usec)         | 3541904.75 |            (R) -23.86% |
>> |                                 | fix_size_alloc_test: p:512, h:1, l:100000 (usec)         | 3597577.25 |             (R) -2.97% |
>> |                                 | full_fit_alloc_test: p:1, h:0, l:500000 (usec)           |  487021.83 |              (I) 4.95% |
>> |                                 | kvfree_rcu_1_arg_vmalloc_test: p:1, h:0, l:500000 (usec) |  344466.33 |                 -0.65% |
>> |                                 | kvfree_rcu_2_arg_vmalloc_test: p:1, h:0, l:500000 (usec) |  342484.25 |                 -1.58% |
>> |                                 | long_busy_list_alloc_test: p:1, h:0, l:500000 (usec)     | 4034901.17 |            (R) -25.35% |
>> |                                 | pcpu_alloc_test: p:1, h:0, l:500000 (usec)               |  195973.42 |                  0.57% |
>> |                                 | random_size_align_alloc_test: p:1, h:0, l:500000 (usec)  |  643489.33 |            (R) -47.63% |
>> |                                 | random_size_alloc_test: p:1, h:0, l:500000 (usec)        | 2029261.33 |            (R) -27.88% |
>> |                                 | vm_map_ram_test: p:1, h:0, l:500000 (usec)               |   83557.08 |                 -0.22% |
>> +---------------------------------+----------------------------------------------------------+------------+------------------------+
>>
>> I have a couple of thoughts from looking at the patch:
>>
>>  - Perhaps split_page() is the bulk of the cost? Previously for this case we 
>>    were allocating order-0 so there was no split to do. For h=1, split would 
>>    have already been called so that would explain why no regression for that 
>>    case?
> 
> For h=1, this patch shouldn't change (as long as nr_pages <
> arch_vmap_{pte,pmd}_supported_shift). This is why you don't see regressions
> in those cases.

arm64 supports 64K contigous-mappings with vmalloc so once nr_pages >= 16 we can
take the huge path.

> 
>>  - I guess we are bypassing the pcpu cache? Could this be having an effect? Dev 
>>    (cc'ed) did some similar investigation a while back and saw increased vmalloc 
>>    latencies when bypassing pcpu cache.
> 
> I'd say this is more a case of this test module targeting the pcpu
> cache. The module allocates then frees one at a time, which promotes
> reusing pcpu pages. [1] Has some numbers after modifying the test such
> that all the allocations are made before freeing any.

OK fair enough.

We are seeing a bunch of other regressions in higher level benchmarks too; but
haven't yet concluded what's causing those. I'll report back if this patch looks
connected.

Thanks,
Ryan


> 
>>  - Philosophically is allocating physically contiguous memory when it is not 
>>    strictly needed the right thing to do? Large physically contiguous blocks are 
>>    a scarce resource so we don't want to waste them. Although I guess it could 
>>    be argued that this actually preserves the contiguous blocks because the 
>>    lifetime of all the pages is tied together. Anyway, I doubt this is the 
> 
> This was the primary incentive for this patch :)
> 
>>    reason for the slow down, since those benchmarks are not under memory 
>>    pressure.
>>
>> Anyway, it would be good to resolve the performance regressions if we can.
> 
> Imo, the appropriate way to address these is to modify the test module
> as seen in [1].
> 
> [1] https://lore.kernel.org/linux-mm/aPJ6lLf24TfW_1n7@milan/