lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <87zfzmf80q.fsf@yhuang6-desk2.ccr.corp.intel.com>
Date:   Fri, 10 Nov 2023 14:16:05 +0800
From:   "Huang, Ying" <ying.huang@...el.com>
To:     Gregory Price <gourry.memverge@...il.com>
Cc:     linux-kernel@...r.kernel.org, linux-cxl@...r.kernel.org,
        linux-mm@...ck.org, cgroups@...r.kernel.org,
        linux-doc@...r.kernel.org, akpm@...ux-foundation.org,
        mhocko@...nel.org, tj@...nel.org, lizefan.x@...edance.com,
        hannes@...xchg.org, corbet@....net, roman.gushchin@...ux.dev,
        shakeelb@...gle.com, muchun.song@...ux.dev,
        Gregory Price <gregory.price@...verge.com>
Subject: Re: [RFC PATCH v4 0/3] memcg weighted interleave mempolicy control

Gregory Price <gourry.memverge@...il.com> writes:

> This patchset implements weighted interleave and adds a new cgroup
> sysfs entry: cgroup/memory.interleave_weights (excluded from root).
>
> The il_weight of a node is used by mempolicy to implement weighted
> interleave when `numactl --interleave=...` is invoked.  By default
> il_weight for a node is always 1, which preserves the default round
> robin interleave behavior.

IIUC, this makes it almost impossible to set the default weight of a
node from the node memory bandwidth information.  This will make the
life of users a little harder.

If so, how about use a new memory policy mode, for example
MPOL_WEIGHTED_INTERLEAVE, etc.

> Interleave weights denote the number of pages that should be
> allocated from the node when interleaving occurs and have a range
> of 1-255.  The weight of a node can never be 0, and instead the
> preferred way to prevent allocation is to remove the node from the
> cpuset or mempolicy altogether.
>
> For example, if a node's interleave weight is set to 5, 5 pages
> will be allocated from that node before the next node is scheduled
> for allocations.
>
> # Set node weight for node 0 to 5
> echo 0:5 > /sys/fs/cgroup/user.slice/memory.interleave_weights
>
> # Set node weight for node 1 to 3
> echo 1:3 > /sys/fs/cgroup/user.slice/memory.interleave_weights
>
> # View the currently set weights
> cat /sys/fs/cgroup/user.slice/memory.interleave_weights
> 0:5,1:3
>
> Weights will only be displayed for possible nodes.
>
> With this it becomes possible to set an interleaving strategy
> that fits the available bandwidth for the devices available on
> the system. An example system:
>
> Node 0 - CPU+DRAM, 400GB/s BW (200 cross socket)
> Node 1 - CXL Memory. 64GB/s BW, on Node 0 root complex
>
> In this setup, the effective weights for a node set of [0,1]
> may be may be [86, 14] (86% of memory on Node 0, 14% on node 1)
> or some smaller fraction thereof to encourge quicker rounds
> for better overall distribution.
>
> This spreads memory out across devices which all have different
> latency and bandwidth attributes in a way that can maximize the
> available resources.
>

--
Best Regards,
Huang, Ying

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ