lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <vbj76pzf5mvooydne5fg2ewgjiducgficskq7hcsdxwywsda7l@qisdlq5q2n3o>
Date: Fri, 8 Mar 2024 12:35:37 -0500
From: Daniel Jordan <daniel.m.jordan@...cle.com>
To: Gang Li <gang.li@...ux.dev>
Cc: Andrew Morton <akpm@...ux-foundation.org>,
        David Hildenbrand <david@...hat.com>,
        David Rientjes <rientjes@...gle.com>,
        Muchun Song <muchun.song@...ux.dev>,
        Tim Chen <tim.c.chen@...ux.intel.com>,
        Steffen Klassert <steffen.klassert@...unet.com>,
        Jane Chu <jane.chu@...cle.com>,
        "Paul E . McKenney" <paulmck@...nel.org>,
        Randy Dunlap <rdunlap@...radead.org>, linux-mm@...ck.org,
        linux-kernel@...r.kernel.org, ligang.bdlg@...edance.com
Subject: Re: [PATCH v6 8/8] hugetlb: parallelize 1G hugetlb initialization

On Thu, Feb 22, 2024 at 10:04:21PM +0800, Gang Li wrote:
> Optimizing the initialization speed of 1G huge pages through
> parallelization.
> 
> 1G hugetlbs are allocated from bootmem, a process that is already
> very fast and does not currently require optimization. Therefore,
> we focus on parallelizing only the initialization phase in
> `gather_bootmem_prealloc`.
> 
> Here are some test results:
>       test case       no patch(ms)   patched(ms)   saved
>  ------------------- -------------- ------------- --------
>   256c2T(4 node) 1G           4745          2024   57.34%
>   128c1T(2 node) 1G           3358          1712   49.02%
>      12T         1G          77000         18300   76.23%

Another great improvement.

> +static void __init gather_bootmem_prealloc_parallel(unsigned long start,
> +						    unsigned long end, void *arg)
> +{
> +	int nid;
> +
> +	for (nid = start; nid < end; nid++)
> +		gather_bootmem_prealloc_node(nid);
> +}
> +
> +static void __init gather_bootmem_prealloc(void)
> +{
> +	struct padata_mt_job job = {
> +		.thread_fn	= gather_bootmem_prealloc_parallel,
> +		.fn_arg		= NULL,
> +		.start		= 0,
> +		.size		= num_node_state(N_MEMORY),
> +		.align		= 1,
> +		.min_chunk	= 1,
> +		.max_threads	= num_node_state(N_MEMORY),
> +		.numa_aware	= true,
> +	};
> +
> +	padata_do_multithreaded(&job);
> +}

Looks fine from the padata side.

Acked-by: Daniel Jordan <daniel.m.jordan@...cle.com> # padata

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ