lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <20240301155417.1852290-1-rulin.huang@intel.com>
Date: Fri,  1 Mar 2024 10:54:15 -0500
From: rulinhuang <rulin.huang@...el.com>
To: urezki@...il.com,
	bhe@...hat.com
Cc: akpm@...ux-foundation.org,
	colin.king@...el.com,
	hch@...radead.org,
	linux-kernel@...r.kernel.org,
	linux-mm@...ck.org,
	lstoakes@...il.com,
	rulin.huang@...el.com,
	tianyou.li@...el.com,
	tim.c.chen@...el.com,
	wangyang.guo@...el.com,
	zhiguo.zhou@...el.com
Subject: [PATCH v7 0/2] mm/vmalloc: lock contention optimization under multi-threading

Hi,

This version has the rearrangement of macros from the previous one.

We are not sure whether we have completely moved these macros and 
their corresponding helper to the correct position. Could you please 
help to check whether they are correct?

~

1. Motivation

When allocating a new memory area where the mapping address range is 
known, it is observed that the vmap_node->busy.lock is acquired twice 
but one of the acquisitions is actually unnecessary.

2. Design

Among the two acquisitions, the first one occurs in the 
alloc_vmap_area() function when inserting the vm area into the vm 
mapping red-black tree, and the second one occurs in the 
setup_vmalloc_vm() function when updating the properties of the vm, 
such as flags and address, etc.

Combine these two operations together in alloc_vmap_area(), which 
improves scalability when the vmap_node->busy.lock is contended.
By doing so, the need to acquire the lock twice can also be eliminated 
to once.

3. Test results

With the above change, tested on intel sapphire rapids
platform(224 vcpu), a 4% performance improvement is gained on 
stress-ng/pthread(https://github.com/ColinIanKing/stress-ng),
which is the stress test of thread creations.

rulinhuang

[v1] https://lore.kernel.org/all/20240207033059.1565623-1-rulin.huang@intel.com/
[v2] https://lore.kernel.org/all/20240220090521.3316345-1-rulin.huang@intel.com/
[v3] https://lore.kernel.org/all/20240221032905.11392-1-rulin.huang@intel.com/
[v4] https://lore.kernel.org/all/20240222120536.216166-1-rulin.huang@intel.com/
[v5] https://lore.kernel.org/all/20240223130318.112198-2-rulin.huang@intel.com/
[v6] https://lore.kernel.org/lkml/aa8f0413-d055-4b49-bcd3-401e93e01c6d@intel.com/


rulinhuang (2):
  mm/vmalloc: Moved macros with no functional change happened
  mm/vmalloc: Eliminated the lock contention from twice to once

 mm/vmalloc.c | 314 +++++++++++++++++++++++++--------------------------
 1 file changed, 155 insertions(+), 159 deletions(-)


base-commit: 10c2cf5fe97647d68ee89b1f921e982e71519f20
-- 
2.43.0


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ