lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Fri,  9 Feb 2024 06:51:47 -0500
From: rulinhuang <rulin.huang@...el.com>
To: urezki@...il.com
Cc: akpm@...ux-foundation.org,
	colin.king@...el.com,
	hch@...radead.org,
	linux-kernel@...r.kernel.org,
	linux-mm@...ck.org,
	lstoakes@...il.com,
	rulin.huang@...el.com,
	tim.c.chen@...el.com,
	zhiguo.zhou@...el.com,
	wangyang.guo@...el.com,
	tianyou.li@...el.com
Subject: Re: [PATCH] mm/vmalloc: lock contention optimization under multi-threading

Hi Rezki, thanks so much for your review. Exactly, your suggestions 
could effectively enhance the maintainability and readability of this 
code change as well as the whole vmalloc implementation. To avoid the 
partial initialization issue, we are trying to refine this patch by 
separating insert_map_area(), the insertion of VA into the tree, from 
alloc_map_area(), so that setup_vmalloc_vm() could be invoked between 
them. However, our initial trial ran into a boot-time error, which we 
are still debugging, and it may take a little bit longer than expected 
as the coming week is the public holiday of Lunar New Year in China. 
We will share with you the latest version of patch once ready for your 
review.
In the performance test, we firstly build stress-ng by following the 
instructions from https://github.com/ColinIanKing/stress-ng, and then 
launch the stressor for pthread (--pthread) for 30 seconds (-t 30) via 
the below command:
/stress-ng -t 30 --metrics-brief --pthread  -1 --no-rand-seed
The aggregated count of spawned threads per second (Bogo ops/s) is 
taken as the score of this workload. We evaluated the performance 
impact of this patch on the Ice Lake server with 40, 80, 120 and 160
 online cores respectively. And as is shown in below figure, with 
 the expansion of online cores, this patch could relieve the 
 increasingly severe lock contention and achieve quite significant 
 performance improvement of around 5.5% at 160 cores.

vcpu number                     40                           80                           120                         160
patched/original               100.5%                  100.8%                  105.2%                  105.5%
 
Thanks again for your help and please let us know if more details 
are needed.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ