lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Date:   Wed,  8 Nov 2023 19:25:14 -0500
From:   Gregory Price <gourry.memverge@...il.com>
To:     linux-kernel@...r.kernel.org
Cc:     linux-cxl@...r.kernel.org, linux-mm@...ck.org,
        cgroups@...r.kernel.org, linux-doc@...r.kernel.org,
        ying.huang@...el.com, akpm@...ux-foundation.org, mhocko@...nel.org,
        tj@...nel.org, lizefan.x@...edance.com, hannes@...xchg.org,
        corbet@....net, roman.gushchin@...ux.dev, shakeelb@...gle.com,
        muchun.song@...ux.dev, Gregory Price <gregory.price@...verge.com>
Subject: [RFC PATCH v4 0/3] memcg weighted interleave mempolicy control

This patchset implements weighted interleave and adds a new cgroup
sysfs entry: cgroup/memory.interleave_weights (excluded from root).

The il_weight of a node is used by mempolicy to implement weighted
interleave when `numactl --interleave=...` is invoked.  By default
il_weight for a node is always 1, which preserves the default round
robin interleave behavior.

Interleave weights denote the number of pages that should be
allocated from the node when interleaving occurs and have a range
of 1-255.  The weight of a node can never be 0, and instead the
preferred way to prevent allocation is to remove the node from the
cpuset or mempolicy altogether.

For example, if a node's interleave weight is set to 5, 5 pages
will be allocated from that node before the next node is scheduled
for allocations.

# Set node weight for node 0 to 5
echo 0:5 > /sys/fs/cgroup/user.slice/memory.interleave_weights

# Set node weight for node 1 to 3
echo 1:3 > /sys/fs/cgroup/user.slice/memory.interleave_weights

# View the currently set weights
cat /sys/fs/cgroup/user.slice/memory.interleave_weights
0:5,1:3

Weights will only be displayed for possible nodes.

With this it becomes possible to set an interleaving strategy
that fits the available bandwidth for the devices available on
the system. An example system:

Node 0 - CPU+DRAM, 400GB/s BW (200 cross socket)
Node 1 - CXL Memory. 64GB/s BW, on Node 0 root complex

In this setup, the effective weights for a node set of [0,1]
may be may be [86, 14] (86% of memory on Node 0, 14% on node 1)
or some smaller fraction thereof to encourge quicker rounds
for better overall distribution.

This spreads memory out across devices which all have different
latency and bandwidth attributes in a way that can maximize the
available resources.

~Gregory

=============
Version Notes:

= v4 notes

Moved interleave weights to cgroups from nodes.

Omitted them from the root cgroup for initial testing/comment, but
it seems like it may be a reasonable idea to place them there too.

== Weighted interleave

mm/mempolicy: modify interleave mempolicy to use node weights

The mempolicy MPOL_INTERLEAVE utilizes the node weights defined in
the cgroup memory.interleave_weights interfaces to implement weighted
interleave.  By default, since all nodes default to a weight of 1,
the original interleave behavior is retained.

============
RFC History

Node based weights
By: Gregory Price
https://lore.kernel.org/linux-mm/20231031003810.4532-1-gregory.price@memverge.com/

Memory-tier based weights
By: Ravi Shankar
https://lore.kernel.org/all/20230927095002.10245-1-ravis.opensrc@micron.com/

Mempolicy multi-node weighting w/ set_mempolicy2:
By: Gregory Price
https://lore.kernel.org/all/20231003002156.740595-1-gregory.price@memverge.com/

Hasan Al Maruf: N:M weighting in mempolicy
https://lore.kernel.org/linux-mm/YqD0%2FtzFwXvJ1gK6@cmpxchg.org/T/

Huang, Ying's presentation in lpc22, 16th slide in
https://lpc.events/event/16/contributions/1209/attachments/1042/1995/\
Live%20In%20a%20World%20With%20Multiple%20Memory%20Types.pdf

===================

Gregory Price (3):
  mm/memcontrol: implement memcg.interleave_weights
  mm/mempolicy: implement weighted interleave
  Documentation: sysfs entries for cgroup.memory.interleave_weights

 Documentation/admin-guide/cgroup-v2.rst       |  45 +++++
 .../admin-guide/mm/numa_memory_policy.rst     |  11 ++
 include/linux/memcontrol.h                    |  31 ++++
 include/linux/mempolicy.h                     |   3 +
 mm/memcontrol.c                               | 172 ++++++++++++++++++
 mm/mempolicy.c                                | 153 +++++++++++++---
 6 files changed, 387 insertions(+), 28 deletions(-)

-- 
2.39.1

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ