linux-kernel - Re: [PATCH v6 8/8] hugetlb: parallelize 1G hugetlb initialization

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <1c6f90d0-ef3a-4ea4-8de6-ad93c93ed3da@linux.dev>
Date: Tue, 12 Mar 2024 10:26:07 +0800
From: Gang Li <gang.li@...ux.dev>
To: Daniel Jordan <daniel.m.jordan@...cle.com>
Cc: Andrew Morton <akpm@...ux-foundation.org>,
 David Hildenbrand <david@...hat.com>, David Rientjes <rientjes@...gle.com>,
 Muchun Song <muchun.song@...ux.dev>, Tim Chen <tim.c.chen@...ux.intel.com>,
 Steffen Klassert <steffen.klassert@...unet.com>,
 Jane Chu <jane.chu@...cle.com>, "Paul E . McKenney" <paulmck@...nel.org>,
 Randy Dunlap <rdunlap@...radead.org>, linux-mm@...ck.org,
 linux-kernel@...r.kernel.org, ligang.bdlg@...edance.com
Subject: Re: [PATCH v6 8/8] hugetlb: parallelize 1G hugetlb initialization

Thanks for your review :)

On 2024/3/9 01:35, Daniel Jordan wrote:
> On Thu, Feb 22, 2024 at 10:04:21PM +0800, Gang Li wrote:
>> Optimizing the initialization speed of 1G huge pages through
>> parallelization.
>>
>> 1G hugetlbs are allocated from bootmem, a process that is already
>> very fast and does not currently require optimization. Therefore,
>> we focus on parallelizing only the initialization phase in
>> `gather_bootmem_prealloc`.
>>
>> Here are some test results:
>>        test case       no patch(ms)   patched(ms)   saved
>>   ------------------- -------------- ------------- --------
>>    256c2T(4 node) 1G           4745          2024   57.34%
>>    128c1T(2 node) 1G           3358          1712   49.02%
>>       12T         1G          77000         18300   76.23%
> 
> Another great improvement.
> 
>> +static void __init gather_bootmem_prealloc_parallel(unsigned long start,
>> +						    unsigned long end, void *arg)
>> +{
>> +	int nid;
>> +
>> +	for (nid = start; nid < end; nid++)
>> +		gather_bootmem_prealloc_node(nid);
>> +}
>> +
>> +static void __init gather_bootmem_prealloc(void)
>> +{
>> +	struct padata_mt_job job = {
>> +		.thread_fn	= gather_bootmem_prealloc_parallel,
>> +		.fn_arg		= NULL,
>> +		.start		= 0,
>> +		.size		= num_node_state(N_MEMORY),
>> +		.align		= 1,
>> +		.min_chunk	= 1,
>> +		.max_threads	= num_node_state(N_MEMORY),
>> +		.numa_aware	= true,
>> +	};
>> +
>> +	padata_do_multithreaded(&job);
>> +}
> 
> Looks fine from the padata side.
> 
> Acked-by: Daniel Jordan <daniel.m.jordan@...cle.com> # padata