lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Date: Tue, 20 Feb 2024 15:25:28 -0500
From: Gregory Price <gourry.memverge@...il.com>
To: linux-mm@...ck.org
Cc: linux-kernel@...r.kernel.org,
	ying.huang@...el.com,
	hannes@...xchg.org,
	dan.j.williams@...el.com,
	dave.jiang@...el.com,
	Gregory Price <gregory.price@...verge.com>
Subject: [RCF 0/1] mm/mempolicy: weighted interleave system default weights

Weighted interleave added a sysfs interface for users to change
the interleave weights based on user input - with a default value
of `1` until reasonable system default code could be agreed upon.

This RFC series will suggest and solicit ideas for how to generate
these system defaults, and lay out some challenges in generating them.

Future work on the CXL driver (drivers/cxl) will introduce additional
code which registers HMAT information for hotplug memory provided
by CXL devices. This RFC does not presently provide that integration,
but will after it is upstream.


Interfaces introduced:
- mempolicy_set_node_perf
  Called when HMAT data for a node is reported to the system

Integration points:
- node_set_perf_attrs - for reporting bandwidth info to mempolicy
- get_il_weight and weighted interleave allocation interfaces to
  provide system defaults when applying weighted interleave.

New data in mempolicy:
- node_bw_table - cached bandwidth information about each node
- default_iw_table - the system default interleave weights


Note that because there are now multiple tables (default and sysfs),
the allocators fetch each weight individually, rather than via memcpy.
This means if weights change at runtime (extremely unlikely), the
allocators may temporarily see an "incorrect distribution" while the
system is being reweighted. This is not harmful (simply inaccurate)
and a result of providing a clean way to revert to the system default.


v1: Simple GCD reduction of basic bandwidth distribution.

Approach:
- whenever new coordinates are reported, recalculate all weights
- cache each node's min(read, write) bandwidth
- calculate the percentage each node's bandwidth is of the whole
- use GCD to reduce all percentages down to the minimum possible

The approach is simple and fast, and operates well under reasonably
well if the numbers reported by HMAT for each node happen to land
on easily reducable percentages.  For example, a system presenting
88% of its bandwidth on DRAM and 11% of its bandwidth on CXL (floored
for simplicity) will end up with default weights of (8:1), which is
a preferably small number assigned in each weight.

The downside of this approach is that it is susceptible to prime and
co-prime numbers keeping interleave weights large (e.g. 89:11 vs 8:1).
We prefer finer grained interleaves to prevent large swaths of
contiguous memory from landing on the same device.

Additionally, this also hides the fact that multi-socket systems
experience chokepoints across sockets.  For example a 2-socket
system with 200GB/s on each socket from DDR does not mean a given
socket has an aggregate of 400GB/s of bandwidth.  Interconnects between
sockets provide less aggregate bandwidth than the DDR they provide
access to (e.g. 3 UPI lanes vs 8 DDR channels).

So this approach will reduce multi-socket interleave weights to (1:1)
by default if all sockets provide the same bandwidth.

Signed-off-by: Gregory Price <gregory.price@...verge.com>

Gregory Price (1):
  mm/mempolicy: introduce system default interleave weights

 drivers/acpi/numa/hmat.c  |   1 +
 drivers/base/node.c       |   7 +++
 include/linux/mempolicy.h |   4 ++
 mm/mempolicy.c            | 129 ++++++++++++++++++++++++++++++--------
 4 files changed, 116 insertions(+), 25 deletions(-)

-- 
2.39.1


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ